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Abstract 

There  is  a  need  to  expedite  the  process  of  designing  military  hardware  to  stay 
ahead  of  the  adversary.  The  core  of  this  project  was  to  build  reusable,  synthesize- 
able  libraries  to  make  this  a  possibility.  In  order  to  build  these  libraries,  Matlab® 
commands  and  functions,  such  as  Conv2,  Round,  Floor,  Pinv,  etc.,  had  to  be  con¬ 
verted  into  reusable  VHDL  modules.  These  modules  make  up  reusable  libraries  for 
the  Mission  Specific  Process  (MSP)  which  will  support  AFRL/RY. 

The  MSP  allows  the  VLSI  design  process  to  be  completed  in  a  mere  matter  of 
days  or  months  using  an  FPGA  or  ASIC  design,  as  opposed  to  the  current  way  of 
developing  a  system  which  can  take  1-2  years  to  complete.  By  having  the  libraries 
built,  the  components  can  be  implemented  in  an  FPGA  or  ASIC  design  over  and 
over  again.  The  libraries  make  it  possible  to  make  upgrades  to  weapons  systems 
to  meet  the  ever-changing  needs  the  War  Fighter  faces.  MSP  makes  it  possible  to 
develop  various  algorithms,  including  algorithms  implemented  in  Matlab®  .  The 
MSP  libraries  were  built  and  tested  using  TSMC  250-nm®technology  library  from  the 
Taiwan  Semiconductor  Manufacturing  Company.  They  were  also  synthesized  for  an 
FPGA.  The  modules  were  all  synthesized  using  the  CAD  tools  from  Cadence®and 
Mentor  Graphics®.  Power,  area,  and  delay  results  for  each  module  were  presented. 
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Hardware  Algorithm  Implementation 


For 

Mission  Specific  Processing 

I.  Introduction 

In  today’s  advancing  technologies,  devices  are  shrinking  and  densities  of  integra¬ 
tions  are  increasing.  With  these  trends  come  many  new  challenges  in  designing  in¬ 
tegrated  electronic  circuits  and  systems.  To  achieve  high  performance  (Power,  Speed, 
Dynamic  Range,  etc.)  in  new  integrated  circuits  for  next-generation  systems;  new 
methodologies  must  be  created,  adopted,  and  executed. 

The  objective  of  this  research  is  to  examine  and  investigate  current  difficul¬ 
ties/problems  associated  with  modeling  and  fabricating  Very  Large-Scale  Integration 
(VLSI)  circuits,  then  provide  reusable  library  cells  for  AFRL/RY  directorate’s  Mis¬ 
sion  Specific  Process  (MSP)  in  Very  High  Speed  Integrated  Circuit  (VHSIC)  Hardware 
Description  Language  (VHDL).  These  reusable  libraries  can  be  broken  into  three  vari¬ 
ations  which  are  as  follows:  optimizing  power,  minimizing  area,  and  minimizing  delay. 
By  having  these  libraries  built  and  ready  to  go,  system  requirements  can  be  upgraded 
and  changed  in  a  matter  of  days  instead  of  months  or  even  years.  These  reusable 
libraries  make  it  possible  to  meet  the  changing  requirements  of  the  operational  envi¬ 
ronment.  The  MSP  makes  it  flexible  for  system  changes  to  be  implemented  quickly 
into  a  system.  The  variations  between  power,  area,  and  delay  can  be  generalized  as 
the  points  on  an  equilateral  triangle  as  seen  in  Figure  1.1.  The  points  of  the  triangle 
represent  the  priority  optimal  design  for  the  given  circuit.  For  example,  if  you  want 
to  optimize  the  power,  the  other  two  parameters  will  not  be  the  priority.  This  will 
cause  the  design  to  have  minimal  power  consumption  at  the  expense  of  the  final  de- 
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Power 


Figure  1.1:  Power,  Area,  and  Delay  Triangle. 


sign  being  large  or  slow.  The  blue  dot  in  the  triangle  represents  where  the  priority 
lies,  meeting  design  specifications. 

The  idea  behind  AFRL/RY  directorate’s  MSP  is  a  threefold  approach  to  de¬ 
signing  circuits.  First,  it  takes  time  for  the  design  to  be  developed  and  fabricated 
before  it  can  get  into  the  hands  of  the  customer.  The  design  to  market  can  take  up 
to  1-2  years  depending  on  the  technology  and  the  level  of  difficultly  of  the  circuit 
design.  In  addition  to  the  amount  of  time  it  takes  to  develop  the  product,  it  can  be 
obsolete  in  only  a  few  years.  This  is  what  makes  MSP  so  unique;  it  is  made  up  of 
pre-built  synthesizable  reusable  libraries  so  the  designer  doesn’t  have  to  start  from 
scratch.  The  designer  can  quickly  use  the  pre-built  libraries  and  make  new  ones  when 
specifications  change  and  add  them  to  the  library. 

Second,  if  modifications  are  required  for  additional  features  to  be  added  to  the 
product,  then  you  have  to  wait  for  the  redesign  to  take  place.  You  have  to  pay  the 
vendor  again  for  the  changes  they  make  and  wait  for  the  new  design  to  be  delivered. 
This  is  where  MSP  comes  in  to  reduce  the  cost  and  schedule  of  redesigning  a  circuit 
just  to  make  a  few  changes  for  additional  specifications.  The  use  of  the  reusable 
libraries  makes  designing  a  circuit  much  easier  than  current  methodologies.  In  a 
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sense,  it  is  like  putting  a  Lego®set  together.  By  having  all  the  pre-built  parts  it’s  just 
a  matter  of  integrating  them  together  for  your  personal  application. 

Thirdly,  a  Field  Programmable  Gate  Array  (FPGA)  can  replace  an  Application 
Specific  Integrated  Circuit  (ASIC)  design,  unless  you  want  absolute  performance. 
The  use  of  MSP  makes  it  flexible  for  the  designer  to  develop  prototypes  faster  and 
cheaper  for  an  FPGA  as  opposed  to  an  ASIC  design.  Having  MSP  in  your  tool  box 
can  save  time  and  money  for  the  next  generation  of  a  circuit  design.  Therefore,  the 
War  Fighter  will  have  new  equipment  in  the  held  in  a  matter  of  days  as  opposed  to 
the  old  way  of  doing  business,  which  could  take  1-2  years  for  a  weapons  system  to  be 
developed. 

This  project  will  support  AFRL/RY  in  their  development  of  a  target  tracking 
project,  where  the  circuits  and  digital  circuits  are  to  be  implemented  in  one  chip. 
The  goals  are  to  provide  reliable  building  blocks  as  portable  synthesizable  reusable 
libraries.  These  libraries  will  enable  the  War  Fighter  to  get  an  upgradable  weapon 
system  in  a  matter  of  a  few  days  or  hours  and  to  keep  up-to-date  with  the  ever- 
changing  Global  War  on  Terrorism. 

Building  these  reusable  libraries  and  making  them  usable  for  FPGA  or  ASIC 
designs  is  beneficial  in  many  ways.  For  instance,  programming  the  design  on  an 
FPGA  makes  it  possible  to  protect  the  design  against  enemy  hands.  Anti-tamper 
methods  on  the  FPGA  board  protects  the  design  from  being  discovered  by  the  en¬ 
emy.  Additionally,  placing  the  designs  on  an  FPGA  allows  the  weapons  system  to  be 
implemented  quicker  than  sending  the  design  to  a  foundry  for  fabrication  as  an  ASIC 
design,  which  can  take  months  due  to  long  lead  times  that  the  foundries  put  in  their 
schedule.  Once  the  design  is  created  on  an  ASIC  chip  it  has  to  be  thoroughly  tested 
to  find  faults  in  the  fabrication  process.  All  these  design  steps  take  time,  especially  if 
there  are  problems  with  the  completed  design. 
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1.1  Specific  Issue 

The  War  on  Terrorism  has  made  the  military  solider  depend  on  today’s  tech¬ 
nology  of  global  positioning  systems,  radar  systems,  and  different  communication 
devices.  These  devices,  being  mobile  or  not,  are  required  for  them  to  operate  in  the 
field  and  communicate  with  Command  and  Control.  For  this  reason  it  is  essential 
that  these  libraries  be  built  and  perfected. 

With  any  circuit  design  there  are  three  key  parmeters  that  designers  face  when 
designing  circuits,  and  they  are  as  follows:  power,  area,  and  delay.  The  designer 
has  to  make  tradeoffs  between  these  three  parmeters  to  meet  their  specific  design 
constraints.  Every  commercial  or  military  application  has  its  own  specifications  for 
power,  area,  and  delay. 

The  VLSI  technology  continues  to  place  more  and  more  transistors  on  a  single 
chip.  This  allows  the  chips  to  become  more  powerful  in  computing  power  as  the 
area  of  the  chips  remain  small.  The  chips  constantly  require  electrical  power  to  keep 
them  operational  which  makes  it  difficult  for  the  War  Fighter  to  do  their  mission 
without  wondering  if  their  batteries  are  going  to  sustain  throughout  their  mission. 
There  is  a  need  to  be  able  to  run  longer  missions  and  have  longer  lasting  equipment 
that  doesn’t  require  battery  change-outs  in  the  middle  of  a  critical  mission.  This  low 
power  optimization  will  be  using  a  250-nm  technology  library  [15]  from  the  Taiwan 
Semiconductor  Manufacturing  Company  (TSMC) .  This  library  will  provide  a  starting 
point  to  develop  circuits  that  have  lower  power  consumption  for  the  future. 

1.2  Problem  Statement 

The  problem  is  to  take  the  AFRL/RY  Optical  Flow  Dense  Algorithm  written 
in  Matlab®  and  convert  the  commands  into  synthesizable  reusable  library  modules 
written  in  VHDL.  We  will  be  laying  the  foundation  with  these  synthesizable  reusable 
libraries  for  other  weapons  systems  that  require  Matlab®  commands  such  as,  Round, 
Floor,  Two  Dimensional  Convolution  (Conv2),  and  Pseudoinverse  (Pinv),  etc.  The 
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Figure  1.2:  A  Sample  of  Target  Tracking  Imagery  [1], 

Optical  Flow  Dense  Algorithm  can  provide  various  libraries  to  handle  Unmanned 
Aerial  Vehicles  for  image  processing.  An  example  of  target  tracking  imagery  can  be 
seen  in  [1]  Figure  1.2.  The  small  rectangles  show  targets  that  can  potentially  be 
tracked  using  Optical  Flow. 

1.3  Scope  and  Assumptions 

It  is  assumed  that  the  reader  has  knowledge  about  the  VLSI  technology  and 
understands  VHDL,  scripting,  and  intergrating  the  script  in  Cadence®software  pro¬ 
gram  or  Modelsim®.  The  main  software  programs  that  will  be  used  for  this  research 
are  Modelsim®,  Mentor  Graphics®,  and  Cadence®Companies,  software  tools.  The 
simulations  will  be  run  on  Modelsim®to  verify  the  Register  Transfer  Level(RTL)  cod¬ 
ing.  Cadence®and  Mentor  Graphics®software  tools  will  be  used  to  verify  that  the 
modules  are  synthesizable. 

1.4  Thesis  Organization 

Chapter  2  of  this  thesis  will  give  background  information  required  to  understand 
the  technology  options  that  are  available  for  use  to  reduce  power  consumption.  Each 
option  will  be  briefly  explained  and  the  main  focus  of  the  research  project  option  will 
be  expounded  into  further  details  and  discussions.  Also,  background  information  will 
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be  given  to  support  the  design  decisions  used  in  Chapter  3.  Chapter  3  will  discuss 
the  theory  and  methods  that  were  used  for  this  thesis  project.  Chapter  4  will  look  at 
the  results  that  were  gathered  throughout  this  research  and  will  be  analyzed  and  dis¬ 
cussed.  Finally,  Chapter  5  will  discuss  future  work  and  topics.  The  Modelsim®VHDL 
code  will  be  located  in  Appendix  C. 
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II.  Background 


This  chapter  will  give  an  overview  of  the  background  information  used  throughout 
this  research.  The  MSP  idea  makes  it  flexible  enough  to  accommodate  the  ever- 
changing  Air  Force  missions.  This  section  will  present  an  overview  of  the  VLSI  Design 
Process  as  well  as  a  more  in  depth  view  of  the  process.  We  will  also  discuss  FPGA 
vs.  ASIC  Risk,  MSP  Design  Reusability,  and  Optical  Flow. 

2.1  Overview  of  VLSI  Design  Process 

The  VLSI  Design  Process  can  be  summed  up  in  the  following  three  main  steps: 
architecture,  verification,  and  implementation  and  can  be  seen  in  Figure  2.1.  The 
architecture  is  made  up  of  three  methods  to  design  a  circuit  which  are  power,  area, 
and  delay  (speed).  Each  architecture  has  it  pros  and  cons  for  design  implementation 
depending  on  what  you  are  trying  to  achieve.  The  use  of  the  circuit  will  drive  what 
architecture  you  should  use  when  designing  it. 

It  is  important  to  verify  that  your  expected  results  match  your  simulated  results. 
Once  you  have  determined  your  design  is  working  properly  through  simulations  it  is 
time  to  implement  your  design  on  an  FPGA  or  fabricated  circuit  such  as  an  ASIC. 


Figure  2.1:  Three  Main  Steps  for  VLSI  Design  Process 
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Figure  2.2:  Six  Steps  for  VLSI  Design  Process 


The  fabrication  process  for  a  circuit  can  take  anywhere  from  3  to  6  months  at 
the  foundry.  In  the  next  section,  the  VLSI  Design  Process  will  be  broken  down  for  a 
more  magnified  look  at  the  steps  involved  to  build  an  operational  design. 


2.2  VLSI  Design  Process 

The  typical  VLSI  Design  Process  can  be  broken  into  the  following  six  steps  and 
can  be  seen  in  Figure  2.2: 

1.  Specification 

2.  Architecture 

3.  RTL  Coding 

4.  RTL  Verification 

5.  Synthesis 

6.  Implementation/Fabrication 

Each  of  these  steps  will  be  discussed  further  for  an  understanding  of  what  is 
required  before  a  design  can  be  implemented  into  the  field  as  an  operational  weapons 
system. 


2.2.1  Specification.  The  specification  for  any  design  comes  from  the  cus¬ 
tomer  who  has  a  specific  need  for  a  project  they  want  built  and  implemented.  There 
are  three  different  design  strategies  for  circuits  -  custom,  ASIC,  and  FPGA.  The 
customer  lists  the  criteria  they  want  for  their  system,  which  could  be  a  specific  size 
(area)  of  the  circuit,  a  certain  power  usage,  or  delay  (speed)  of  the  design,  or  even  that 
they  want  the  design  written  in  VHDL.  The  vendor  (designer)  will  discuss  with  the 
customer  the  possibilities  based  on  the  technology  available  at  the  time  the  original 
specification  was  created. 

2.2.2  Architecture.  There  are  three  different  architectures  that  can  be  im¬ 
plemented  in  any  design.  The  three  architectures  are  power,  area,  and  delay  (speed). 

2.2.2. 1  Power.  Clock  control  plays  a  major  role  when  designing  a 
circuit  to  reduce  power  consumption.  Reducing  the  speed  of  the  clock  for  a  circuit 
will  reduce  the  switching  activity.  This  results  in  power  savings  by  limiting  the  amount 
of  switching  activity  that  takes  place.  Besides  reducing  the  speed  of  the  clock  for  the 
circuit,  designers  have  also  proposed  “clock  gating  [13]  by  modifying  the  design  of 
the  existing  energy  recovery  clocked  flip-flops  to  incorporate  a  power  saving  feature 
that  eliminates  any  energy  loss  on  the  internal  clock  and  other  nodes  of  the  flip- 
flops.”  According,  to  Steve  Kilts,  in  his  book,  Advanced  FPGA  Design,  he  suggests 
the  following: 

The  most  effective  and  widely  used  technique  for  lowering  the  dynamic 
power  dissipation  in  synchronous  digital  circuits  is  to  dynamically  disable 
the  clock  in  specific  regions  that  do  not  need  to  be  active  at  particular 
stages  in  the  data  flow  [9]. 

It  would  be  ideal  to  have  the  circuit  temporarily  turn  the  clock  off  when  a  par¬ 
ticular  section  of  the  circuit  is  not  required  to  reduce  power  consumption.  These  are 
a  few  techniques  that  are  available  for  use  to  reduce  power  consumption.  Additional 
power  reduction  techniques  can  be  found  in  Appendix  A. 
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2. 2. 2. 2  Area.  Another  architecture  that  can  be  used  when  developing 
a  circuit  design  is  area.  Ways  to  implement  reduction  in  area  for  a  circuit  depends 
on  picking  the  correct  topology.  Topology  that  focus  on  reducing  the  area  size  of 
a  circuit  can  be  attained  by  reusing  [9]  “the  logic  resources  to  the  greatest  extent 
possible,  often  at  the  expense  of  throughput  (speed).”  When  you  want  to  increase  the 
delay  for  a  design  you  need  to  pipeline  your  design,  ffowever,  to  reduce  the  area  you 
need  to  do  the  opposite  of  creating  pipelines,  you  need  to  roll  them  up  to  be  able  to 
use  the  available  resources.  Also,  it  is  a  good  idea,  according  to  Steve  Kilts,  to  [9] 
“share  logic  resources  between  different  functional  operations.” 

2. 2. 2. 3  Delay  (Speed).  The  third  architecture  that  can  be  used  when 
designing  a  circuit  is  speed.  There  are  three  ways  to  describe  speed  in  regards  to  circuit 
design  -  they  are  throughput,  latency,  and  timing.  In  the  book  Advanced  FPGA  Design  [9], 
Kilts  gives  a  description  for  each  one  of  these. 

A  high-throughput  design  is  one  that  is  concerned  with  the  steady-state 
data  rate  but  less  concerned  about  the  time  any  specific  piece  of  data 
requires  to  propagate  through  the  design  (latency).  A  low-latency  design 
is  one  that  passes  the  data  from  the  input  to  the  output  as  quickly  as 
possible  by  minimizing  processing  delays.  Timing  refers  to  the  clock  speed 
of  a  design.  The  maximum  delay  between  any  two  sequential  elements  in 
a  design  will  determine  the  max  clock  speed  [9]. 

A  key  factor  that  should  be  considered  when  designing  a  circuit  for  speed  is  to 
use  pipelining  wherever  possible  to  increase  your  throughput.  The  idea  of  pipelining 
is  very  similar  to  how  an  assembly  line  works.  Each  member  is  continually  performing 
their  specific  task;  they  finish  one,  pass  it  on  to  the  next  station,  and  get  another. 

The  result  is  that  a  completed  product  is  continuously  produced.  Pipelining  [9]  is  a 
way  to  increase  throughput  for  a  process.  Another  [9]  technique  is  to  have  the  system 
run  things  in  parallel  to  speed  up  the  process.  This  technique  is  most  useful  when 
doing  math  calculations  in  a  design. 
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2.2.3  RTL  Coding.  The  third  step  in  the  VLSI  Design  Process  is  RTL 
Coding.  RTL  Coding  is  nothing  more  than  what  language  you  will  code  your  design 
in.  There  are  different  Hardware  Description  Languages  (HDL)  that  can  be  used  to 
make  an  RTL  module.  The  most  commonly  used  RTL  modeling  languages  are  the 
following: 

1.  VHDL 

2.  Verilog 

3.  System  Verilog 

VHDL  is  the  adapted  hardware  description  languages  for  the  Department  of 
Defense  (DoD).  For  this  thesis  we  will  code  our  design  in  VHDL. 

2.2.4  RTL  Verification.  Step  four  in  the  VLSI  Design  Process  is  RTL 
Verification.  RTL  Verification  is  when  you  verify  that  your  VHDL  code  simulates 
correctly.  The  simulation  waveform  generated  in  Mode lsim® gives  you  a  waveform  to 
show  the  circuit  is  functioning.  If  this  waveform  gives  you  the  expected  results  then 
your  code  and  RTL  are  working  properly. 

2.2.5  Synthesis/Manual  Layout.  A  flexible  synthesis  tool,  such  as  Leonardo 
Spectrum  by  Mentor  Graphics®Tool,  allows  a  synthesizable  HDL  design  to  be  syn¬ 
thesized  for  both  FPGA  and  ASIC.  The  software  tool  uses  optimization  algorithms 
to  determine  the  best  floor  design,  place,  and  route  of  the  design.  The  synthesis  step 
has  made  it  possible  for  an  FPGA  to  only  need  the  design  to  be  download  to  the 
FPGA  board  to  verify  whether  the  simulation  results  match  the  synthesis  results  and 
meet  the  specifications  that  were  stated  at  the  beginning  of  the  VLSI  Design  Process. 
Unlike  the  FPGA  design,  after  the  ASIC  design  is  synthesized,  it  still  needs  to  be  sent 
to  a  foundry  to  be  fabricated.  The  use  of  FPGA  eliminates  the  fabrication  step,  it  is 
economically  inexpensive  to  own  and  can  be  reconfigured  for  a  new  design  in  a  matter 
of  seconds,  depending  on  the  size  of  the  design.  Therefore,  if  there  was  a  problem 
with  the  timing  of  the  design  and  it  didn’t  meet  the  specifications  of  the  design,  the 
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Figure  2.3:  Nand  Gate  Layout 


designer  would  be  aware  of  the  problem  and  begin  to  correct  the  timing  issue.  How¬ 
ever,  the  ASIC  design  wouldn’t  find  this  problem  until  the  design  was  tested  after  it 
was  received  from  the  foundry. 

When  developing  a  custom  design,  a  manual  layout  could  be  used.  This  would 
require  you  to  hand  place  every  piece  of  the  circuit  in  a  location  that  is  determined  by 
you.  Then  you  would  have  to  route  all  the  cell  wires  together.  This  is  a  very  tedious 
and  time  consuming  process.  The  designer  has  to  worry  about  design  rule  violations 
such  as  placing  the  cells  too  close  together  or  too  far  apart  causing  timing  delays.  An 
example  of  a  handmade  design  of  a  Nand  Gate  is  seen  in  Figure  2.3.  This  Nand  Gate 
is  only  one  gate.  If  you  have  a  complicated  design,  in  the  range  of  hundreds  of  millions 
of  gates,  you  can  see  how  this  task  can  be  extremely  time  consuming.  Once  the  gates 
have  been  layed  out,  then  you  have  to  determine  if  they  are  in  the  optimal  position 
for  the  place  and  route  step.  This  design  lacks  scalability  when  new  technologies  are 
created,  the  old  layout  may  not  be  usable  at  all  in  ASICs.  However,  ASIC  designs 
will  have  a  higher  absolute  performance  than  FPGA  designs. 


2.2.6  Implementation/ Fabrication.  Implementation/Fabrication  is  the  final 
step  in  the  VLSI  Design  Process.  Implementation/Fabrication  consists  of  the  design 
being  synthesized  for  an  ASIC  design,  otherwise  it  would  be  a  manual  layout  design. 
The  design  is  finalized  and  the  circuit  is  sent  on  as  a  computer  hie  to  the  foundry  for 
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the  fabrication  process.  At  the  foundry,  several  different  metal  layers  and  polysilicon 
layers  will  be  used  to  create  the  design.  The  design  will  also  need  a  set  of  test  vectors 
to  be  sent  with  it  to  the  foundry.  This  is  required  to  determine  if  the  design  is  working 
correctly.  If  the  design  comes  back  correct,  then  this  ASIC  or  custom  design  can  only 
be  used  for  this  given  application.  On  the  other  hand,  if  the  design  comes  back  not 
working  properly,  then  the  designer  will  have  to  analyze  the  circuit  and  determine 
where  in  the  six  step  process  the  design  failed.  This  basically  leaves  the  designer  at 
square  one.  All  of  this  time  has  been  invested  into  a  product  that  does  not  work. 
Even  if  the  change  is  found  within  a  few  days  it  will  still  be  3  to  6  months  from  the 
time  they  send  it  to  the  foundry  before  they  will  see  it  again.  Therefore,  making  an 
ASIC  design  is  very  costly  and  time  consuming  for  the  design  process.  Unlike  the 
ASIC,  the  FPGA  can  be  designed  and  implemented  in  an  shorter  time  span  and  is 
an  economical,  reusable,  and  reconhguable  product. 

2.3  FPGA  vs.  ASIC  Risk 

Now  that  the  VLSI  design  cycle  has  been  thoroughly  discussed,  let  us  examine 
the  risk  that  is  involved  with  building  a  circuit  to  meet  an  FPGA  or  ASIC  design. 
The  FPGA  is  an  inexpensive  design  option  that  will  only  cost  a  few  thousand  dollars 
for  a  board.  The  Static  Random  Access  Memory  (SRAM)  FPGA  is  reconfigurablc  in 
that  once  you  put  one  design  on  the  board  you  can  take  the  program  off  the  board  and 
reprogram  the  board  for  a  completely  new  design  with  no  new  costs  involved.  Once 
the  design  is  synthesizable  and  thoroughly  tested,  it  requires  less  time  to  become 
operational.  There  is  also  a  higher  chance  that  your  FPGA  design  will  synthesize  and 
work  the  first  time,  unlike  the  ASIC  design,  saving  a  company  time  and  money.  On  the 
other  hand,  an  ASIC  design  is  an  expensive  investment  and  will  require  substantially 
more  time  and  money  to  be  implemented.  For  example,  let  us  say  that  a  company  is 
implementing  a  new  design  and  they  choose  to  take  the  route  of  an  ASIC  design.  They 
could  invest  a  million  dollars  into  the  program  before  it  is  even  fabricated  due  to  the 
labor  intensive  effort  required.  If  the  design  fails  once,  it  comes  back  from  fabrication 
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to  engineers  who  will  thoroughly  look  at  the  design  to  understand  why  it  failed.  Once 
the  failure  has  been  determined,  it  may  cost  an  additional  $100,000  from  the  time 
the  engineers  determine  the  problem  to  the  time  it  is  refabricated  and  works.  Also, 
this  design  can  only  be  used  for  this  single  application.  If  this  application  was  for  a 
satellite  that  was  in  space  and  it  failed,  the  satellite  would  lose  that  functionality  in  the 
system.  However,  if  the  design  was  on  an  FPGA,  the  design  could  be  reprogrammed 
in  a  matter  of  minutes  to  keep  the  system  functioning.  As  you  can  see,  there  are 
several  risks  that  need  to  be  considered  when  deciding  what  design  solution  to  use. 
In  this  thesis,  we  will  create  a  set  of  synthesizable  VHDL  libraries  targeting  FPGAs 
and  ASICs.  However,  we  are  putting  emphasis  on  the  FPGAs  due  to  extra  time  and 
cost  required  for  us  to  fabricate  ASIC  solutions  for  validation  purposes. 

2-4  MSP  Design  Reusability 

Now  you  have  seen  the  design  steps  required  for  a  VLSI  circuit  design  and  how 
there  are  two  different  approaches  to  reach  a  solution.  MSP  offers  design  reusability 
by  creating  multiple  modules  of  the  same  functioning  task  that  focus  on  optimizing 
power,  area,  and  delay  modules.  A  typical  design  takes  an  average  of  24  months 
to  complete.  The  MSP  design  can  take  3  months  on  average  to  complete,  due  to 
the  synthesizable  reusable  libraries.  Each  design  is  unique  to  itself  with  different 
variables  that  depend  on  the  length  of  time  required  to  complete  a  design.  Some  of 
the  variables  that  need  to  be  considered  are  as  follows:  complexity  factor,  can  you 
reuse  anything  from  a  previous  design,  is  the  task  fully  defined,  experience  of  the 
designer,  etc.  A  typical  design  timeline  can  be  seen  in  Figure  2.4.  The  MSP  libraries 
include  the  Architecture,  RTL  Coding,  RTL  Verification,  and  Synthesis.  Using  these 
MSP  libraries  can  save  time,  money,  and  resources  when  designing  a  circuit.  Chang 
and  Aguan  [3]  state  how  important  it  is  to  have  reusable  VHDL  modules  in  their 
journal  article  “Design-for-reusability”  in  VHDL. 
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Figure  2.4:  Standard  Design  Time  vs.  MSP  Design  Time 


The  reuse  of  electronic  components  can  improve  productivity  in  system  de¬ 
sign.  However,  without  careful  planning,  components  are  rarely  designed 
for  reuse. 

There  is  also  more  than  one  option  when  designing  a  multiplier,  adder,  or  con¬ 
troller.  You  can  build  a  behavior  combinational  multiplier,  structurally  combinational 
multiplier,  behavior  sequential  multiplier,  structurally  sequential  multiplier,  or  even 
a  Booth  multiplier  just  to  name  a  few.  They  all  have  strengths  and  weaknesses  for  a 
design.  The  MSP  idea  is  to  build  reusable  libraries  to  meet  the  needs  of  the  designers. 
A  multiplier  is  a  good  example  of  this.  The  designer  can  build  three  variations  of  a 
multiplier  to  have  one  optimized  for  power,  one  for  area,  and  one  for  delay  (speed). 
When  a  new  requirement  is  generated  and  it  requires  the  use  of  a  multiplier,  the 
reusable  libraries  of  power,  area,  and  delay  will  be  available.  Since  these  modules 
have  already  been  optimized,  there  is  no  additional  work  for  the  designer.  This  con¬ 
cept  goes  back  to  the  the  power,  area,  and  delay  triangle  that  was  shown  in  Figure  1.1. 
The  pre-built  modules  make  it  possible  for  the  designer  to  only  have  to  choose  which 
module  will  work  best  for  the  specifications  they  were  given.  This  can  be  seen  in 
Table  2.1  where  there  are  four  types  of  multiplier  designs  showing  their  results  for 
power,  area,  and  delay.  These  results  came  from  the  software  tool  Cadence®.  The 


15 


Table  2.1:  Summary  of  Maximum  Clock  Frequency  for  Modules 


Type  of  Multiplier 

Power  mW 

Area  mm2 

Max.  Freq.  MHz 

Constrained 

Booth 

37.17 

0.1537 

53.99 

Yes 

Combinational  Behavior 

7.35 

0.2043 

53.09 

No 

Combinational  Structural 

9.13 

0.2422 

28.23 

No 

Sequential  Behavior 

64.87 

0.2151 

37.31 

Yes 

constrained  column  in  the  table  gives  a  clear  picture  of  what  power,  area,  and  maxi¬ 
mum  clock  frequency  can  be  achieved.  When  combinational  designs  are  synthesized  in 
Cadence®there  is  no  clock  used  to  determine  how  fast  the  design  will  run  accurately. 
Therefore,  Cadence®gives  a  best  guess  estimate  for  the  power,  area,  and  delay.  As 
you  can  see,  the  Booth  Multiplier  has  the  fastest  delay  but  the  third  largest  amount 
of  power  consumption.  One  multiplier  design  may  have  the  smallest  area,  but  the 
slowest  delay.  It  is  up  to  the  designer  to  pick  which  module  will  work  best  for  their 
application. 

If  you  take  this  a  step  further  and  expand  to  other  modules  like  adders,  subtrac¬ 
tors,  and  controllers,  you  will  have  generated  an  arsenal  of  modules  that  have  been 
optimized  for  power,  area,  and  delay.  This  will  be  like  picking  out  a  new  car.  What 
options  do  you  want  for  a  new  car  -  power  windows,  cd  player,  full  size  spare  tire,  etc.? 
Having  these  libraries  built,  reconhgurable,  and  reusable  makes  designing  a  weapon 
system  simpler  and  saves  time,  money,  and  resources.  Also,  using  an  FPGA  makes 
it  easier  to  upgrade  the  system  at  a  lower  cost  while  applying  less  time.  There  will 
be  several  different  variations  of  the  same  module  focusing  on  optimized  power,  area, 
and  delay.  For  example,  if  a  design  calls  for  fast  multiplications  to  be  performed,  the 
designer  has  the  option  to  choose  the  multiplier  that  has  been  optimized  for  speed 
from  the  reusable  libraries  that  are  already  built. 

In  this  thesis  we  will  be  implementing  a  Booth  multiplier.  The  Booth  multiplier 
is  based  on  adding,  subtracting,  and  shifting  the  binary  values  several  times.  Andrew 
Booth  [5]  noticed  this  can  be  achieved  by  having  a  lookup  table  to  determine  if  the 
binary  value  needs  to  be  added,  subtracted,  or  do  no  operation  and  only  shift  the 
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binary  values  one  place.  The  Booth  algorithm  is  summarized  in  points  1-3  below  [5]. 
By  using  a  lookup  table,  it  is  possible  to  quickly  determine  what  operation  needs  to  be 
performed.  The  best  thing  about  the  Booth  multiplier  is  the  option  it  has  to  perform 
no  operation  if  there  is  a  run  of  “00”  or  “11”  in  the  number  being  multiplied.  This 
do  nothing  option  reduces  the  number  of  operations,  saving  clock  cycles  required  to 
compute  the  multiplication.  For  example,  if  the  bit  width  is  32  bits,  it  will  only  require 
16  clock  cycles  to  complete  the  multiplication.  For  this  thesis  we  will  implement  a 
2  bit  shift  Booth  multiplier  to  cut  the  clock  cycles  in  half  from  the  given  number  of 
input  bits. 

Booth  Multiplier  Algorithm  Summary: 

1.  Examine  each  pair  of  digits  in  the  multiplier,  creating  the  first  pair  by  appending 
a  dummy  ‘O’  at  the  least  significant  end. 

If  the  pair  is  01,  add  the  multiplicand. 

If  the  pair  is  10,  subtract  the  multiplicand. 

Otherwise,  do  nothing. 

2.  Shift  both  partial  product  and  multiplier  one  place  to  the  right,  allowing  the 
next  pair  of  digits  to  be  examined. 

3.  Repeat  as  many  times  as  there  are  digits  in  the  multiplier  [5]. 

2.5  Optical  Flow 

The  Optical  Flow  algorithm  compares  two  images  together  to  see  what  is  dif¬ 
ferent  between  the  two  images.  Some  applications  where  optical  flow  are  used  are 
change  detection,  computer  vision,  pattern  recognition,  tracking  targets,  and  image 
processing.  Optical  Flow  is  best  described  by  Horn  and  Schunck  [7]  in  their  paper, 
“Determining  Optical  Flow.” 

Optical  flow  is  the  distribution  of  apparent  velocities  of  movement  of 
brightness  patterns  in  an  image.  Optical  flow  can  arise  from  relative  mo- 
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Figure  2.5:  Optical  Flow  Vectors  for  a  250  x  400  Image 

tion  of  objects  and  the  viewer.  Consequently,  optical  flow  can  give  impor¬ 
tant  information  about  the  spatial  arrangement  of  the  objects  viewed  and 
the  rate  of  change  of  this  arrangement. 

There  are  two  [7]  [10]  well  regarded  methods  for  calculating  optical  flow  -  are 
Horn-Schunck  and  Lucas-Kanade  methods.  The  Horn-Schunck  [7]  method  looks  at  the 
difference  between  the  brightness  and  contrasts  between  the  two  images  to  estimate 
what  changes  have  occurred.  The  changes  are  represented  in  a  vector  field  to  show 
the  direction  of  motion  the  image  is  moving  compared  to  the  first  image.  The  Lucas- 
Kanade  method  [10]  makes  use  of  the  “spatial  intensity  gradient  of  the  images.”  Using 
the  Horn-Schunck  or  the  Lucas-Kanade  method  requires  large  amounts  of  calculations 
to  determine  if  there  is  a  change  between  the  first  image  and  the  second  image.  In 
this  project,  we  will  look  at  the  Lucas-Kanade  method  to  determine  an  optical  flow 
solution  between  two  images.  An  example  of  a  vector  field  that  was  produced  between 
two  images  using  the  Lucas-Kanade  method  is  shown  in  Figure  2.5.  These  results  were 
attained  from  Matlab®  comparing  two  images  that  are  250  x  400  pixels. 
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2.6  Chapter  Summary 

The  background  information  for  Chapter  2  covered  the  following:  VLSI  Design 
Process,  FPGA  vs.  ASIC  Risk,  MSP  Design  Reusability,  and  Optical  Flow.  Several 
low  power,  area,  and  delay  (speed)  implementations  were  also  looked  at  and  discussed 
in  the  architecture  section  of  the  VLSI  Design  Flow  section.  MSP  has  shown  the 
importance  of  creating  many  different  modules  for  optimized  power,  area,  and  delay 
that  will  perform  the  same  task.  Two  different  methods  to  perform  Optical  Flow 
have  been  discussed,  which  were  the  Lucas-Kanade  and  Horn-Schunck  Optical  Flow 
methods.  We  will  build  reusable  libraries  in  VHDL  for  the  Lucas-Kanade  method 
so  the  libraries  can  later  build  a  complete  Lucas-Kanade  Optical  Flow  system.  This 
image  processing  can  be  used  in  part  of  the  target  tracking  project.  These  ideas  will 
be  discussed  further  in  Chapter  3  to  develop  this  thesis  project. 
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III.  Methodology 


The  methodology  used  to  convert  the  Matlab®  commands  that  generated  the 
Dense  Optical  Flow  will  be  discussed  in  this  chapter.  The  research  goal  is  to 
convert  the  Matlab®  commands,  located  in  Appendix  B,  used  in  the  Dense  Optical 
Flow  to  build  reusable  VHDL  libraries  such  as,  Conv2,  Matrix  Transpose,  Round, 
Floor,  and  Pinv,  etc.  The  goal  is  to  demonstrate  functioning  MSP  modules  that  are 
reusable  for  image  processing  such  as  Optical  Flow,  DSP,  computer  vision,  pattern 
recognition,  tracking  targets,  change  detection,  etc.  These  MSP  libraries  will  be  the 
key  for  the  foundation  of  these  applications. 

The  Dense  Optical  Flow  Matlab®  command  functions  will  be  created  and 
demonstrated  using  smaller  modules.  These  modules  will  make  up  the  parts  for  the 
Matlab®  command  that  can  later  be  used  to  build  an  Optical  Flow  system  using  MSP 
modules.  An  example  of  some  of  the  Matlab®  commands  that  are  used  in  Optical 
Flow  code  can  be  seen  in  Listing  III.  1. 


Listing  III.  1 : 


1 

7,  Example  of  using  the  conv2  command 

2 

hResult  =  conv2(im,  mask); 

3 

4 

7,  Example  of  using  the  matrix  transpose  command 

5 

curFx  =  curFx  ’  ; 

6 

7 

7.  Example  of  using  the  Pinv  command 

8 

U  =  pinv (A ’ *A) *A 1  * curFt ; 

9 

10 

7.  Example  of  using  the  round  command 

11 

uln  =  round(uln); 

12 

13 

7.  Example  of  using  the  floor  command 

14 

halfWindow  =  f loor ( windowSize /2 ) ; 

To  create  these  commands,  smaller  modules  were  designed  around  the  Matlab® 
Dense  Optical  Flow  written  by  Sohaib  Khan  [8]  using  the  Lucas-Kanade  algorithm  [10]. 
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The  reason  for  creating  individual  modules  and  making  them  configurable  components 
is  to  be  able  to  use  a  generic  module  and  shape  it  to  meet  the  needs  of  a  specific  de¬ 
sign.  We  will  assume  that  the  image  sizes  will  not  change  while  the  data  is  being 
recorded  with  a  camera  and  the  modules  will  be  set  to  receive  grayscale  images.  This 
parameter  is  configurable  to  meet  future  needs.  The  configurability  parameters  of 
many  of  the  individual  modules  can  be  seen  in  Table  3.1. 


Table  3.1:  Configurable  parameters 


Parameter 

Description 

address width 

Bit  width  of  matrix  size  (row,  column) 

data_width 

Bit  width  of  pixel  depth  values 

3.1  Overall  Design 

The  overall  design  of  the  Dense  Optical  Flow  was  broken  into  smaller  parts 
called  modules.  These  modules  were  designed  to  take  on  the  same  characteristics 
as  the  Matlab®  commands.  There  are  two  different  designs  for  the  Reduce  Matrix 
Function  that  will  be  developed  in  VHDL.  These  two  designs  will  look  at  large  area  and 
small  power  consumption  between  the  designs.  These  modules  were  built  using  VHDL 
and  tested  with  test  benches  to  prove  that  they  functioned  properly.  Other  Matlab® 
commands  that  will  be  created  in  VHDL  are  the  following:  Conv2,  Round,  Floor, 
Matrix  Transpose,  Compute  Derivatives,  Reduce  Matrix  Function,  Adder,  Multiplier, 
Divider,  Subtractor,  and  Pinv.  Also,  the  goal  is  to  implement  the  Matlab®  functions 
as  VHDL  modules.  For  instance,  the  2x2  matrix  called  A  has  the  following  parameters 
while  the  2x2  matrix  called  B  has  the  following  parameters: 


(  1  2 

Matrix  A 

y  3  4 


Matrix  B 


5  6 
7  8 


The  following  Matlab®  code  calculates  the  two-dimensional  convolution  of  Ma¬ 
trices  A  and  B  as  seen  in  Equation  3.1.  The  resulting  answer  is  stored  in  the  3x3 
Matrix  C. 
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C  =  Conv2(A,  B ) 


(3.1) 


Matrix  C  produces  the  following  result: 


/ 


Matrix  C 


V 


5 

16 

12 

22 

60 

40 

21 

52 

32 

\ 


These  results  match  the  results  from  the  VHDL  convolution  module  with  0%  error. 
The  other  Mat  lab®  commands  or  functions  were  converted  to  VHDL  to  build  the 
reusable  module  libraries  to  lay  the  foundation  for  Dense  Optical  Flow  algorithm.  The 
Matlab®  commands  that  are  only  specific  to  Matlab®  were  developed  and  function 
properly  include:  Conv2,  Matrix  Transpose,  Round,  Floor,  and  Pinv. 


3.2  Two  Dimensional  Convolution 

The  two  dimensional  convolution  Matlab®  command  Conv2  can  be  seen  in 
Equation  3.1.  The  Conv2  to  Matlab®  command  is  the  backbone  for  the  image  pro¬ 
cessing  that  has  to  take  place  in  an  application  such  as  Optical  Flow.  Other  Matlab® 
functions  that  were  developed  by  Sohaib  Khan,  such  as  a  Reduce  Function  and  Com¬ 
puting  Derivatives,  can  not  be  developed  without  the  use  of  the  Conv2  command. 
Therefore,  the  Conv2  command  was  the  first  module  that  was  created. 

Matrix  C  is  created  from  the  size  of  Matrix  A  and  B.  For  example,  Matrix  C  is 
[D+F-l,  E+G-l].  These  parameters  come  from  the  size  of  Matrix  A[D,E]  and  Matrix 
B[F,G].  The  algorithm  for  the  Conv2  can  be  seen  in  Equation  3.2. 


OO  OO 

C(n1,n2)  =  ^2  a(h,k2)b(ni  ~  h,n2  ~  k2)  (3.2) 

k±=— oo  k2=—oo 

From  the  Matlab®  Conv2  algorithm,  VHDL  modules  were  written  and  im¬ 
plemented  having  the  same  characteristics  as  the  Matlab®  command  using  a  state 
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machine  to  take  the  place  of  the  nested  for-loops.  The  Conv2  Mat  lab®  command 
requires  the  following  modules  to  be  built  in  VHDL  in  order  for  it  to  function: 


1.  Memory  Module 

2.  Multiplier  Module 

3.  Control  Module 

4.  Multiplexer  Module 

5.  Adder  Module 

6.  Register  Module 

Once  these  modules  are  built,  they  will  be  connected  together,  like  connecting 
Legos®together,  to  build  the  Conv2  Module  for  VHDL.  Simulations  will  be  run  to  test 
the  VHDL  Conv2  and  the  results  will  be  compared  to  the  Matlab®  Conv2  command. 

3.2.1  Memory.  Picture  images  are  made  up  of  grayscale  values  between 
0-255  integer  values  that  makeup  the  brightness  of  the  image.  These  values  are  used 
to  discern  between  adjacent  pixel  colors  of  the  images.  A  picture  image  is  nothing 
more  than  a  two-dimensional  matrix  that  has  rows  and  columns  and  is  loaded  with 
integer  values  to  represent  the  brightness  of  the  pixels.  The  memory  method  that  will 
be  used  is  to  initially  generate  a  matrix  larger  than  the  image  that  is  going  to  have 
the  Conv2  command  performed  on  it.  This  is  required  when  the  Conv2  command 
is  finished  computing,  the  resultant  matrix  is  larger  than  the  initial  image  matrix 
size.  Therefore,  to  take  into  account  the  matrix  being  larger  than  the  initial  image 
size,  the  memory  module  will  initially  be  filled  with  all  zero’s.  This  oversized  matrix 
eliminated  the  problem  of  the  Conv2  command  from  stepping  out  of  bounds  when 
computing  the  results  for  the  Conv2  command.  The  matrix  can  then  be  filled  with 
the  required  grayscale  values  that  represent  the  image. 

3.2.2  Multiplier.  The  initial  Multiplier  Module  that  will  be  developed  will 
only  be  a  basic  behavior  model.  The  second  Multiplier  Module  that  will  be  developed 
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will  be  a  2-bit  Booth  Multiplier.  This  multiplier  will  be  able  to  cut  the  clock  cycle 
in  half  for  calculating  the  results.  For  instance,  the  data  input  for  the  pixel  value  for 
the  image  is  8  bits,  but  we  need  to  take  into  account  the  fractional  parts  of  numbers 
that  are  multiplied  together.  The  Mat  lab®  Optical  Flow  uses  a  1  x  5  mask  matrix 
that  is  made  up  of  a  Gaussian  distribution.  The  Mask  1x5  matrix  has  the  following 
values: 

Mask  Matrix  (  .05  .25  .4  .25  .05  ) 

When  these  calculations  are  performed,  the  results  give  a  fractional  part  to  the 
resultant.  Therefore,  we  will  use  fixed-point  notation  to  represent  the  fractional  part 
of  the  number.  In  doing  so,  we  will  allocate  16  bits  for  the  fractional  part  to  represent 
the  Gaussian  distribution  trying  to  minimize  the  error  that  will  be  caused  since  the 
values  .05  or  .4  will  be  only  a  close  proximation  of  these  values.  This  leaves  us  with 
16  bits  for  the  fraction  part  and  8  bits  for  the  grayscale  images  yielding  a  total  of  24 
bits  required  to  represent  the  images.  We  will  initially  set  the  image  data  width  to 
24  bits.  However,  this  parameter  is  configurable  to  a  larger  bit  width,  if  required. 

3.2.3  Control.  The  Control  Module  could  be  implemented  using  a  for-loop 
but  these  loops  cannot  typically  be  used  for  algorithmic  iterations  in  synthesizable 
code  [9] .  Therefore,  a  state  machine  will  be  built  to  unroll  the  iterative  loop  that  will 
increase  throughput  for  the  design. 

3.2.4  Multiplexer,  Adder,  Register.  The  Multiplexer  Module  is  used  to 
switch  between  the  loading  of  images  and  the  Control  Module,  and  is  connected  to 
the  Memory  Module.  Once  the  image  is  loaded  into  the  Memory,  the  Multiplexer 
will  switch  from  a  load  mode  to  only  communicate  with  the  control  logic.  The  Adder 
Module  is  used  to  add  the  values  together  once  the  multiplication  is  completed  for  a 
set  of  values.  The  Conv2  Matlab®  commands  require  the  sum  of  matrix  positions  to 
be  multiplied  and  added  together.  The  Register  Module  locks  in  the  values  that  have 
been  added  together  into  its  register.  Once  all  the  calculations  have  been  completed, 
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the  Control  Module  will  send  a  signal  to  Matrix  C  so  the  data  in  the  Register  Module 
can  be  written  into  Matrix  C  at  location  (0,0). 

3. 3  Matrix  Transpose 

Part  of  the  Matlab®  code  transposes  the  input  image.  The  following  modules 
are  required  to  build  the  Matrix  Transpose  Module: 

1.  Memory  Module 

2.  Control  Module 

3.  Multiplexer  Module 

Since  the  goal  of  the  MSP  is  to  create  reusable  libraries,  creating  the  Matrix 
Transpose  Module  has  been  simplified  because  the  Memory  Module  was  already  de¬ 
veloped  while  creating  the  Conv2  command.  Therefore,  the  only  new  module  required 
to  be  built  is  a  state  machine  Control  Module  to  transpose  a  matrix. 

3-4  Compute  Derivatives 

Another  Matlab®  function  module  that  needs  to  be  developed  is  one  that  is  able 
to  compute  derivatives.  The  Fx  and  Fy  derivatives  are  used  to  determine  the  edge 
detection  of  an  object  that  has  moved  between  Irnagel  and  Image2.  The  Ft  derivative 
is  the  summation  of  Irnagel  and  Image2.  You  can  picture  it  as  placing  Image2  on  top 
of  Irnagel,  which  forms  the  Ft  derivative.  The  Compute  Derivatives  Module  is  built 
around  the  Matlab®  Conv2  command.  Again,  the  MSP  reusable  libraries  make  it 
possible  for  the  creation  of  the  Compute  Derivative  function  by  being  able  to  recycle 
the  Memory  and  Conv2  modules.  The  equation  seen  in  Equation  3.3  is  from  the 
Compute  Derivatives  function  that  was  written  in  Matlab®  .  This  equation  requires 
the  Conv2  command  to  be  used  twice  to  multiply  Irnagel  by  a  fixed  2x2  matrix  and 
added  together  with  the  result  from  Image2  being  Conv2  with  a  fixed  2x2  matrix. 

Fx  =  Conv2(Imagel ,  0.25 [ — 11;  —11])  +  Conv2(Image2 ,  0.25[ — 11;  —11]);  (3.3) 
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The  following  modules  are  required  to  build  the  Compute  Derivative  Module: 


1.  Memory  Module 

2.  Control  Module 

3.  Conv2  Module 

The  Matlab®  Compute  Derivatives  Function  code  can  be  seen  in  Listing  III. 2. 

Listing  III.2: 

1 
2 

3 

4 

5 

6 

7 

8 
9 

10 
11 
12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 
23 


[fx,  fy ,  ft]  =  Comput eDer i vat i ve s ( iml  ,  im2); 

function  [fx ,  fy,  ft]  =  Comput eDer i vat ives ( iml ,  im2); 

7.  ComputeDerivatives  Compute  horizontal,  vertical  and 
7,  time  derivative  between  two  gray-level  images. 

if  ( size ( iml  ,  1 )  ~=  size(im2,l))  I  (size(iml,2)  ~=  size(im2,2)) 

error (’input  images  are  not  the  same  size’); 
end  ; 

if  ( size ( iml  , 3) ~  =  1)  I  (  size ( im2 , 3) ~  =  1 ) 

err or ( 1  method  only  works  for  gray-level  images’); 
end  ; 

fx  =  conv2 ( iml  ,  0 . 25* [ - 1  1;  -1  1])+  conv2(im2,  0.25*[-l  1;  -1  1] )  ; 
fy  =  conv2 ( iml  , 0 . 25* [ - 1  -1;  1  1])+  conv2(im2,  0.25*[-l  -1;  1  1]  )  ; 
ft  =  conv2 ( iml  , 0 . 25 *  ones (2 ) )  +  conv2 ( im2  ,  -0 . 25* ones  (2) )  ; 

7,  make  same  size  as  input 
f  x  =  f x  (  1  :  size  (fx  ,1)  -1  ,  1:  size  (fx  ,2)  -1)  ; 

f y  =  f y (1 : size(fy,l) -1,  1: size(fy,2)  -1)  ; 

ft  =  ft (1: s ize  ( f t  ,1)  -1  ,  1: s ize  ( f t  ,2) -1)  ; 
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3.5  Reduce  Matrix  Function 


When  dealing  with  image  processing,  the  images  need  to  be  down  sampled  in 
size.  This  down  sampling  is  part  of  the  Optical  Flow.  We  will  use  the  terminology 
Reduce  Matrix  Function  instead  of  down  sampling.  The  Reduce  Matrix  Function  is 
another  Matlab®  function  that  needs  to  be  created  in  VHDL.  The  Reduce  Matrix 
Function  Module  takes  the  initial  image  and  reduces  it  in  size  by  one  half.  For 
example,  an  initial  image  size  of  250  x  400  will  be  125  x  200  after  performing  the 
Reduce  Matrix  Function.  Here  is  another  example  of  how  the  MSP  design  thought 
process  uses  the  created  reusable  libraries  to  develop  another  module.  The  key  module 
in  image  processing  is  the  Conv2  Module;  again,  it  is  used  to  build  another  module. 
The  following  modules  are  required  to  build  the  Reduce  Matrix  Function  Module: 


1.  Conv2  Module 

2.  Control  Module 

3.  Reduce  Control  Horizonal  Module 

4.  Reduce  Control  Vertical  Module 

5.  Memory  Module 

6.  Multiplexer  Module 

7.  Adder  Module 

8.  Register  Module 

The  Matlab®  Reduce  Matrix  Function  code  can  been  seen  in  Listing  III. 3. 


Listing  III. 3: 


1 

2 

3 

4 

5 

6 


function  smalllm  =  Reduce (im) 

°/0  REDUCE  Compute  smaller  layer  of  Gaussian  Pyramid 

7.  Sohaib  Khan,  Feb  16,  2000 

7.  Algo 
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7 

l  Gauss 

ian 

mask  =  [0.05 

0 . 25 

0.4  0. 

25 

0 . 05] 

8 

7.  Apply 

Id 

mask  to 

alt  ernat  e 

pixels 

a 

long  each 

row  of 

9 

7.  apply 

Id 

mask  to 

each 

pixel 

along 

a 

lternate 

columns 

10 

/  of  re 

suiting  imag 

e 

11 

12 

mask  = 

[0. 

05  0.25  0 

.4  0 

25  0. 

05]  ; 

13 

14 

hResult 

= 

conv2 ( im  , 

mask) ; 

15 

hRe  suit 

= 

hResult  (  : 

, 3 : size (hR 

esult  , 

2) 

-2)  ; 

16 

hRe  suit 

= 

hResult  (  : 

,  1 : 2 : s ize 

(hResu 

It 

,2))  ; 

17 

18 

vRe  suit 

= 

conv2 (hRe 

suit 

mask 

’) ; 

19 

vRe  suit 

= 

vResult  (3 

: size (vResult  ,  1) 

-2 

,  :); 

20 

vRe  suit 

= 

vResult  ( 1 

: 2 : s ize ( vR 

esult  , 

1) 

, :) ; 

21 

22 

smalllm 

= 

vResult  ; 

3.6  Round/Floor 

The  Round  and  Floor  commands  are  predefined  Matlab®  commands  that  do 
not  exist  in  VHDL.  The  Round  Module  will  take  an  input  and  round  it  up  or  down 
depending  on  what  the  fractional  part  of  the  number  is.  If,  for  example,  the  number 
is  4.5,  the  Round  Module  will  round  the  number  to  5  and  if  the  number  is  4.49  it  will 
round  the  number  to  4.  On  the  other  hand,  the  Floor  command  will  take  the  floor  of 
a  number.  For  example,  if  the  value  is  5.9  it  will  floor  the  value  to  5. 

3.7  Pseudoinverse  (Pinv) 

The  Pseudoinverse,  also  known  as  Pinv,  is  used  to  calculate  the  inverse  of  a 
matrix.  Not  all  matrices  have  an  inverse,  therefore,  the  Pseudoinverse  is  used  to 
find  a  close  matrix  inverse  for  a  matrix.  The  Pseudoinverse  [12]  Equation  3.4  is 
the  equation  used  to  calculate  the  Pseudoinverse  for  Matrix  A  and  the  answer  is 
stored  in  Y.  The  Matlab®  code  uses  the  equation  seen  in  Equation  3.5  to  do  this 


calculation.  The  method  used  to  build  the  Pseudoinverse  Module  will  be  to  use  the 
Matrix  Transpose  module  and  multiply  it  by  the  original  Matrix  A.  While  that  is 
being  done,  we  will  multiply  Matrix  Transpose  by  Byector,  where  the  Byector  is  the 
Ft  derivative.  Doing  these  two  multiplications  in  parallel,  we  will  perform  the  matrix 
inverse  on  the  resultants  using  the  LU-Factorization  method  [6]  to  find  the  inverse. 
The  Pseudoinverse  Module  will  include  the  following  modules  for  its  development: 


Y  =  (ATA)~1AT 


(3.4) 


Y  =  (ATA)-1(ATBVect„) 


(3.5) 


1.  Memory  Module 

2.  Control  Module 

3.  Multiplier  Module 

4.  Divider  Module 

5.  Adder  Module 

6.  Matrix  Transpose  Module 

7.  Register  Module 

3.8  Synthesis /Timing 

The  Xilinx®Virtex-4  SX  ML402  FPGA  will  be  the  target  FPGA  to  attain  the 
power,  area,  and  delay  of  the  circuits.  The  Virtex-4  is  a  generalized  moderate  cost 
FPGA  that  is  a  practical  standard  FPGA  for  DSP  applications.  The  Virtex-4  ML402 
is  a  good  evaluation  board  with  a  wide  range  of  applications  such  as  DSP  and  low 
power.  The  Precision  RTL®RTL  2007a.8  and  Xilinx®ISE  9.2  are  the  two  synthesis 
tools  that  will  be  used. 

In  one  method,  the  two  dimensional  convolution  module  is  built  using  sequential 
and  combinational  modules  that  make  up  the  Conv2.  The  Conv2  module  does  many 


29 


multiplications  and  additions  to  compute  the  result  for  Matrix  C.  These  sub- modules 
are  built  using  combinational  logic.  Therefore,  if  the  clock  runs  too  fast,  the  multipli¬ 
cations  and  additions  will  not  be  completed  before  the  next  clock  cycle,  causing  the 
wrong  result  to  be  stored  in  Matrix  C.  One  of  the  ways  to  eliminate  this  problem  is 
to  lower  the  clock  speed. 

A  second  method  would  be  to  place  a  pipeline  register  between  the  multiplier 
and  adder  to  increase  throughput  by  increasing  the  clock  speed.  Timing  is  important 
for  Dense  Optical  Flow  due  to  the  number  of  calculations  required  to  complete  two 
simple  2x2  matrices  to  create  a  3  x  3  matrix.  Timing  simulations  will  show  how  fast 
the  system  can  run  to  increase  throughput.  Timing  simulations  can  also  determine 
if  it  is  required  to  add  additional  pipeline  registers  to  increase  throughput.  The 
timing  simulations  can  also  determine  how  slow  the  clock  can  run  without  including 
additional  registers  or  hardware.  By  slowing  the  clock  speed  to  a  minimal  speed, 
power  can  be  saved  in  the  design.  All  these  things  rely  on  the  timing  simulations. 
The  Mentor  Graphics®tools  will  also  play  a  role  in  attaining  the  results  for  Chapter 
4  of  this  thesis  project. 

3.9  Testing  Procedure 

There  are  several  ways  to  perform  testing  on  these  modules  that  will  be  devel¬ 
oped.  First,  Mode lsim® version  6.3c  will  be  used  to  develop  the  VHDL  code.  The 
Modelsim®software  will  also  be  used  to  generate  test  benches  to  show  the  function¬ 
ality  of  modules.  Once  the  VHDL  modules  are  functioning,  they  will  be  run  through 
the  Precision  RTL®2007a.8  and  Xilinx®lSE  9.2  software  programs  for  synthesis  to 
look  at  power,  area,  and  delay  of  the  circuits.  This  will  give  an  estimate  of  how  to 
reduce  high  power  draws  that  occur  due  to  the  high  amounts  of  calculations  required 
by  Dense  Optical  Flow.  This  may  require  that  the  circuit  clock  speeds  be  slowed 
down  to  reduce  power,  which  in  turn  can  minimize  circuit  size  and  slow  the  speeds  of 
the  system  being  tested. 
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3.10  Chapter  Summary 

The  idea  of  breaking  the  Mat  lab®  commands  and  functions  into  synthesizable 
reusable  library  modules  that  are  used  in  image  processing  were  discussed.  These 
modules  will  support  AFRL/RY  later  when  they  want  to  create  the  Optical  Flow  as 
a  system  by  taking  the  synthesizable  reusable  libraries  and  connecting  them  together 
to  build  an  Optical  Flow  that  can  be  fully  designed  in  VHDL.  The  synthesis,  timing, 
and  testing  procedures  were  introduced  and  discussed.  The  software  that  will  be  used 
to  carry  out  this  methodology  was  discussed  and  how  it  will  be  used.  Each  sub¬ 
module  was  thoroughly  tested  to  ensure  there  will  be  no  errors  when  final  assembly 
of  the  sub-modules  are  connected  to  create  an  MSP  module  such  as  a  Conv2,  Matrix 
Transpose,  Round,  Floor,  and  Pinv,  etc. 
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IV.  Analysis  and  Results 


This  chapter  will  discuss  the  overview  design  of  the  MSP  modules  that  were 
discussed  in  the  methodology  section.  We  will  also  look  at  the  power,  area,  and 
delay  for  the  MSP  modules.  We  will  also  look  at  the  error  analysis  calculated  between 
the  Matlab®  code  and  the  VHDL  modules. 

4-1  Overview  of  MSP  Modules 

The  overarching  module  that  needed  to  be  created  was  the  Conv2  Module.  This 
module  was  used  in  the  development  of  the  Reduce  Matrix  Function  and  Computing 
Derivatives.  The  Top-Level  Design  1  for  the  Reduce  Matrix  Function  can  be  seen 
in  Figure  4.1.  The  Reduce  Matrix  Function  requires  the  use  of  Conv2  Module  to 
perform  the  Conv2  Matlab®  command.  When  using  the  Conv2  command  in  Matlab® 
the  matrix  size  grows  larger  due  to  the  Conv2  algorithm.  The  hControl/vControl 
Reduce  Module  are  used  to  reduce  the  oversized  matrix  that  is  created  when  Conv2 
is  performed.  The  Reduce  Matrix  Function  is  used  to  reduce  a  matrix  size  in  half. 
For  example,  an  image  the  size  of  6  x  8  reduces  to  a  3  x  4.  The  Conv2  Module  will 
generate  a  6  x  12.  The  hControl  Reduce  Module  then  reduces  the  6  x  12  matrix  down 
to  6  x  4.  The  Conv2  Module  will  generate  a  10  x  4  which  will  be  reduced  to  3  x  4 
using  the  vControl  Reduce  Module.  The  final  image  is  stored  in  the  Memory  Module 
as  a  3  x  4. 


Figure  4.1:  Reduce  Matrix  Function  Top  Level  Design  1 
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Figure  4.2:  Conv2  Sub-Modules  Design 


The  modules  that  were  used  in  the  the  Top-Level  Design  1  for  the  Conv2  Module 
can  be  seen  in  Figure  4.2.  These  modules  make  up  the  Conv2  Module.  The  first  Conv2 
Module  uses  a  1  x  5  Mask  matrix  opposed  to  the  5x1  Mask  matrix  that  is  used  in 
the  second  Conv2  Module.  The  Conv2  Module  design  uses  Multiplexer,  Memory, 
Multiplier  and  Controller  Modules.  They  are  connected  together  to  achieve  the  result 
for  Conv2  Matlab®  command. 

The  Behavior  Multiplier  can  be  seen  in  Figure  4.3.  The  three  modules,  Be¬ 
havior  Multiplier,  Adder,  and  Register  are  required  in  the  development  of  the  Conv2 
command. 

The  modules  that  are  required  to  build  a  2-bit  Booth  Multiplier  Module  can 
be  seen  in  Figure  4.4.  The  Booth  Multiplier  Module  requires  the  input  values  to  be 
loaded  into  the  Multiplicand  and  Product  Module.  The  Control  Module  is  used  to 
communicate  between  the  other  modules. 


Figure  4.3:  Behavior  Multiplier  Module  Design 
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Figure  4.4:  Booth  Multiplier  Module  Design 


The  Top-Level  Design  2  that  was  used  to  develop  the  Reduce  Matrix  Function 
can  be  seen  in  Figure  4.5.  This  design  approach  eliminates  the  use  of  a  second 
Multiplier  Module  reducing  power  consumption  and  minimizing  area  compared  to 
Top-Level  Design  1.  Further  analysis  will  be  discussed  later  in  Chapter  4  to  see 
the  differences  between  the  two  design  methods.  Every  large  scale  design  requires  a 
Control  Module  to  communicate  between  the  other  modules.  The  Control  Module  is 
basically  the  “brains”  for  the  module. 


Figure  4.5:  Reduce  Matrix  Function  Top  Level  Design  2 
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The  Control  Module  that  was  used  for  the  Behavior  Multiplier  Module  uses 
seven  states  for  its  state  machine.  The  Control  Module  is  used  to  communicate 
between  the  Multiplier  Module  and  the  other  modules.  The  seven  states  that  the 
Control  Module  uses  to  communicate  with  the  Behavior  Multiplier  Module  can  be  seen 
in  Figure  4.6.  The  Booth  Multiplier,  on  the  other  hand,  requires  that  the  Control 
Module  be  built  using  24  states  for  its  state  machine  if  using  a  32  bit  input.  The 
Booth  Multiplier  state  machine  can  be  seen  in  Figure  4.7.  The  Booth  Multiplier  is 
performing  its  shifts,  adds,  and  subtraction  operations  during  States  S4-S20  for  a  32 
bit  number.  If  the  input  number  was  to  be  reduced  to  24  bits,  it  would  require  4  less 
states  for  the  Controller  to  use. 

The  Compute  Derivatives  Top-Level  Design  can  be  seen  in  Figure  4.8.  Again, 
you  can  see  how  the  Conv2  Module  was  used  to  build  the  Compute  Derivatives  Mod¬ 
ule.  This  function  requires  the  use  of  the  Conv2,  Controller,  Adder,  and  a  Memory 


Figure  4.7:  Booth  Multiplier  State  Machine  Design 
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Input  1 


Figure  4.8:  Compute  Derivatives  Top-Level  Design 


to  build  the  derivative  module  in  VHDL.  The  MSP  Design  effort  payed  off  in  the 
creation  of  the  MSP  Compute  Derivatives  Modules  by  being  able  to  reuse  previously 
designed  modules. 

The  Matrix  Transpose  Module  requires  the  use  of  a  Multiplexer,  Controller,  and 
Memory.  The  Matrix  Transpose  Module  receives  data  through  the  Multiplexer  which 
loads  the  Memory.  The  Controller  is  used  to  transpose  the  values  from  the  Memory 
to  Memory  Transpose.  The  Top-Level  design  of  the  Matrix  Transpose  Module  can  be 
seen  in  Figure  4.9. 
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on  CD 


^  Output 

Figure  4.10:  Pseudoinverse  Top-Level  Design 

The  Pseudoinverse  (Pinv)  Module  can  be  broken  into  3  Modules.  The  (AT A)~l 
is  one  module,  the  AT Byect0r  the  other  module,  and  finally  the  module  that  will 
perform  the  LU-Factorization.  The  Pinv  Module  Top-Level  Design  can  be  seen  Fig¬ 
ure  4.10. 

4-  2  Error  Analysis 

When  creating  the  VHDL  modules  to  have  the  same  functionality  as  the  Matlab® 
commands  and  functions,  error  is  introduced.  Part  of  this  error  is  introduced  and  can 
be  seen  from  the  Mask  Matrix  1  x  5  or  its  transpose  5x1  that  have  the  Gaussian 
Distributions  values  in  the  Memory  Matrix.  The  values  .05  and  .40  can  not  exactly  be 
represented  in  binary  numbers  when  only  using  a  limited  amount  of  bits  to  represent 
these  values.  We  used  16  bits  to  represent  the  fractional  part  of  the  Mask  Matrix 
values.  Therefore,  when  the  Conv2  Module  multiplies  the  image  by  the  mask,  the 
resultant  will  have  error.  This  is  due  to  the  fact  that  the  Mask  Values  were  not  being 
exactly  compared  to  the  Matlab®  values.  The  close  approximation  values  that  were 
used  for  the  Mask  Matrix  and  Matlab®  values  can  be  seen  in  Table  4.1.  The  error 
analysis  equation  that  will  be  used  is  %Error  =  -  HDmatl\bAB  x  100- 

The  error  analysis  that  was  shown  for  the  Reduce  Matrix  Function  for  the  VHDL 
compared  to  the  Matlab®  can  be  seen  in  Figure  4.11.  The  largest  amount  of  error 
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Percent  Difference  Between  Matlab  Output 
&  VHDL  Output 
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Figure  4.11:  Reduce  Matrix  Function  Matlab®  vs.  VHDL 

Percent  Error 


Table  4.1:  Comparison  between  Matlab®  and  VHDL  Gaussian  Distribution 


Matlab®  Gaussian  Distribution 

0.05 

0.25 

0.4 

0.25 

0.05 

VHDL  Gaussian  Distribution 

0.049987793 

0.25 

0.399993896 

0.25 

0.049987793 

%  Error  Difference 

0.024414 

0 

0.001526 

0 

0.024414 

that  occurred  was  .0059%  for  the  sample  6x8  data  set  that  we  used.  The  6x8  data 
set  that  was  used  can  be  seen  in  the  Input  Data  Matrix.  This  data  set  was  arbitrarily 
chosen.  The  Reduce  Matrix  Function  reduces  the  6x8  matrix  down  to  3  x  4,  but 
not  before  the  6x8  has  the  Conv2  Module  performed  on  it.  The  x-axis  represents 
the  positions  in  the  the  3x4  matrix.  The  y-axis  shows  the  percent  error  between  the 
Matlab®  output  and  the  VHDL  output. 
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to  the  Matlab®  code.  This  is  due  to  the  fact  that  the  Compute  Derivative  equation 
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used  a  positive  or  negative  2x2  matrix  with  the  values  .25  which  can  be  represented 
in  binary  with  zero  percent  error.  Also,  the  Matrix  Transpose  Module  had  zero  error 
because  it  took  the  original  Matrix  A  and  transposed  it  to  become  Matrix  AT .  There 
was  no  loss  of  data  precision  when  transposing  Matrix  A. 

The  sample  test  data  set  that  was  used  to  test  the  Pseudoinverse  Module  was 
Matrix  A  9  x  2  and  Matrix  Byector  9x1.  The  comparison  between  Matlab®  and 
VHDL  results  can  be  seen  in  Table  4.2.  The  results  are  stored  in  a  2  x  1  matrix.  The 
largest  percent  error  from  the  sample  test  data  was  2.65%.  A  small  amount  of  error 
occurs  from  the  divider  that  is  used  to  calculate  the  Pseudoinverse.  This  small  error 
gets  magnified  when  the  value  is  multiplied.  This  intermediate  result  is  divided  and 
multiplied  again  as  part  of  the  LU-Factorization  in  order  to  find  the  Pseudoinverse. 
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2 
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Table  4.2:  Comparison  between  Matlab®  and  VHDL  Pseudoinverse  Module 


Matrix  Position  (1,  1) 

Matrix  Position  (2,  1) 

Matlab®  Results 

0.731775 

0.305488 

VHDL  Results 

0.716705 

0.313827 

%  Error  Difference 

2.10 

2.65 
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Reduce  Matrix  Function 
Net  Power  (mW)  Vs.  Clock  Speed 
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Figure  4.12:  Reduce  Matrix  Function  Module  Net 

Power(mW)  vs.  Clock  Speed 


4-3  Power,  Area,  and  Delay 

In  the  power,  area,  and  delay  section  we  will  look  at  the  results  that  were 
attained  from  the  MSP  modules  implemented  using  the  Cadence®synthesis  tool.  The 
Cadence®software  tool  did  not  specifically  target  an  FPGA  or  an  ASIC  for  the  results 
that  were  attained. 

First,  we  will  look  at  the  results  that  were  attained  from  the  Reduce  Matrix 
Function,  seen  in  Figure  4.12.  From  the  excel  chart,  it  is  clear  that  the  Top-Level 
Design  2  approach  requires  less  net  power  to  be  used  for  the  Reduce  Matrix  Function 
Module.  The  power  savings  that  occurs  for  Top-Level  Design  2  over  Top-Level  Design 
1  is  at  least  2.8  times  smaller.  In  Figure  4.12  the  Cadence®software  also  stated  that 
the  fastest  delay  for  Top-Level  Design  1  using  a  behavior  Multiplier  Module  ran  at 
52.02  MHz.  The  other  designs  ran  at  53.86  MHz. 

In  Figure  4.13  it  is  clear  that  Top  Level  Design  2  uses  a  smaller  area  in  the 
design,  up  to  54.6%  smaller,  compared  to  the  Behavior  Multiplier  Top-Level  Design 
1  and  Booth  Multiplier  Top-Level  Design  2. 

The  results  for  Fx,  Fy,  and  Ft  can  be  seen  in  Figure  4.14  for  Power  vs.  Clock 
Speed  using  the  Cadence®software.  As  expected,  as  you  increase  the  clock  speed,  the 
net  power  increases.  The  clock  speeds  and  area  that  were  attained  for  the  excel  charts 
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Figure  4.13:  Reduce  Matrix  Function  Module  Cell  Area  vs. 
Clock  Speed 


Compute  Derivatives 
Net  Power  (mW)  Vs.  Clock  Speed 
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Figure  4.14:  Compute  Derivatives  Module  Net  Power(mW) 

vs.  Clock  Speed 
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Compute  Derivatives 
Cell  Area  Vs.  Clock  Speed 
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Figure  4.15:  Compute  Derivatives  Module  Cell  Area  vs.  Clock 
Speed 

are  only  estimations  generated  from  the  Cadence®software.  Every  software  package 
uses  different  algorithms  to  attain  the  power,  area,  and  delay. 

The  area  for  the  Compute  Derivatives  was  unchanged  as  the  speed  increases  as 
shown  in  Figure  4.15.  For  example,  you  are  able  to  achieve  the  same  cell  area  for  10 
MHz  design  or  a  52  MHz  design. 

The  Matrix  Transpose  Module  achieved  the  results  for  Power  vs.  Clock  Speed 
in  Figure  4.16.  The  net  power  increased  as  the  speed  increased  which  is  what  is 
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Figure  4.16:  Matrix  Transpose  Module  Net  Power(mW)  vs. 
Clock  Speed 
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Matrix  Transpose 
Cell  Area  Vs.  Clock  Speed 
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Figure  4.17:  Matrix  Transpose  Module  Cell  Area  vs.  Clock 
Speed 

expected.  The  cell  area  was  unchanged  when  the  clock  speed  was  increased.  This  can 
be  seen  in  Figure  4.17. 

The  Pseudoinverse  Module  achieved  the  results  for  Power  vs.  Clock  Speed  in 
Figure  4.18.  The  net  power  increased  as  the  clock  speed  increased  which  is  what  is 
expected.  The  cell  area  was  unchanged  when  the  clock  speed  was  increased.  This  can 
be  seen  in  Figure  4.19. 
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Figure  4.18:  Pseudoinverse  Module  Net  Power(mW)  vs. 

Clock  Speed 
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Cell  Area  Vs.  Clock  Speed 
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Figure  4.19:  Pseudoinverse  Module  Cell  Area  vs.  Clock  Speed 

4-4  Synthesis 

In  the  synthesis  section,  all  the  modules  were  targeted  for  the  Xilinx®Virtex-4 
SX  ML402  FPGA  that  uses  90nm  technology.  The  device  used  was  AV  SX35FF668, 
and  a  Speed  Grade  of  —10  was  used  to  attain  area  and  speed  for  the  synthesis. 
The  Xilinx®Virtex-4  board  isn’t  a  state-of-the  art  board  and  therefore  is  very  cost 
efficient.  Without  a  doubt,  high-end  FPGA  systems  will  work  much  faster  (2-3  times 
the  speed)  than  the  current  one  with  Configurable  Logic  Blocks  (CLB)  to  spare.  Like 
any  synthesis  software  tool,  each  one  uses  different  algorithms  for  optimal  power,  area, 
and  delay  for  designs.  That  is  why  the  results  in  this  section  will  not  match  the  results 
from  the  Cadence®software  since  it  uses  250  nm  technology  to  attain  the  area  and 
delay  for  the  modules.  The  Reduce  Matrix  Function  Module  was  synthesized  using 
the  Precision  RTL®software,  which  is  a  Mentor  Graphics®product. 

The  Reduce  Matrix  Function  Module  for  the  Behavior  Multiplier  Top-Level 
Design  1  results  can  be  seen  in  Table  4.3.  The  constraint  for  the  frequency  used 
was  76  MHz.  After  synthesis,  the  maximum  clock  speed/frequency  was  76.482  MHz. 
Therefore,  the  synthesis  had  no  timing  violations.  The  Reduce  Matrix  Function  used 
only  10.24%  of  the  FPGA  area.  There  are  a  total  of  15360  CLB  slices  available  for 
the  targeted  FPGA. 
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Table  4.3:  FPGA  Synthesize  Results  for  Top-Level  1,  Behavior  Multiplier 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

264 

448 

59.93 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

3145 

30720 

10.24 

CLB  Slices 

1573 

15360 

10.24 

Dffs  or  Latches 

2027 

31616 

6.41 

Block  RAMs 

5 

192 

2.60 

DSP48s 

10 

192 

5.21 

Table  4.4:  FPGA  Synthesize  Results  for  Top-Level  2,  Behavior  Multiplier 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

166 

448 

37.05 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

2725 

30720 

8.87 

CLB  Slices 

1362 

15360 

8.87 

Dffs  or  Latches 

1670 

31616 

5.28 

Block  RAMs 

4 

192 

2.08 

DSP48s 

8 

192 

4.17 

The  Reduce  Matrix  Function  Module  for  the  Behavior  Multiplier  Top-Level 
Design  2  results  can  be  seen  in  Table  4.4.  The  constraint  for  the  frequency  used 
was  73  MHz.  After  synthesis,  the  maximum  clock  speed/frequency  was  73.115  MHz. 
Therefore,  the  synthesis  had  no  timing  violations.  Design  2  only  requires  8.87%  of 
the  CLB  slices  saving  15.49%  more  area  than  Design  1. 

The  Reduce  Matrix  Function  Module  for  the  Booth  Multiplier  Top-Level  Design 

1  results  can  be  seen  in  Table  4.5.  The  constraint  for  the  frequency  used  was  100  MHz. 
After  synthesis,  the  maximum  clock  speed/frequency  was  101.75  MHz.  Therefore,  the 
synthesis  had  no  timing  violations.  Design  1  only  requires  11.32%  of  the  CLB  slices. 
This  results  in  a  10%  increase  in  area  over  the  behavior  design.  The  Booth  design 
has  a  39.1%  increase  in  maximum  clock  frequency  over  the  Behavior  Design  1. 

The  Reduce  Matrix  Function  Module  for  the  Booth  Multiplier  Top-Level  Design 

2  results  can  be  seen  in  Table  4.6.  The  constraint  for  the  frequency  used  was  100  MHz. 
After  synthesis,  the  maximum  clock  speed/frequency  was  100.422  MHz.  Therefore, 
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Table  4.5:  FPGA  Synthesize  Results  for  Top-Level  1,  Booth  Multiplier 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

268 

448 

59.82 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

3475 

30720 

11.31 

CLB  Slices 

1738 

15360 

11.32 

Dffs  or  Latches 

2387 

31616 

7.55 

Block  RAMs 

5 

192 

2.60 

DSP48s 

6 

192 

3.13 

Table  4.6:  FPGA  Synthesize  Results  for  Top-Level  2,  Booth  Multiplier 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

173 

448 

38.62 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

2887 

30720 

9.40 

CLB  Slices 

1444 

15360 

9.40 

Dffs  or  Latches 

1867 

31616 

5.91 

Block  RAMs 

4 

192 

2.08 

DSP48s 

6 

192 

3.13 

the  synthesis  had  no  timing  violations.  Design  2  only  requires  9.40%  of  the  CLB 
slices.  This  results  in  a  6%  increase  in  area  over  the  Behavior  Design.  Although,  the 
Booth  design  has  a  37.34%  increase  in  maximum  clock  frequency  over  the  Behavior 
Design  2.  Design  2  for  the  Booth  is  20.36%  smaller  than  Design  1  using  a  Booth 
multiplier. 

The  Compute  Derivatives  Fx,  Fy,  and  Ft  Modules  synthesis  results  can  be  seen 
in  Table  4.7,  Table  4.8,  and  Table  4.9  respectively.  The  constraint  for  Compute 
Derivative  Fx  Module  had  a  frequency  of  105  MHz.  After  synthesis,  the  maximum 
clock  speed/frequency  was  105.876  MHz.  Therefore,  the  synthesis  had  no  timing 
violations. 

The  constraint  for  Compute  Derivative  Fy  Module  in  Table  4.8  had  a  frequency 
of  105  MHz.  After  synthesis,  the  maximum  clock  speed/frequency  was  105.876  MHz. 
Therefore,  the  synthesis  had  no  timing  violations. 
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Table  4.7:  FPGA  Synthesize  Results  for  Compute  Derivative  Fx  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

106 

448 

23.66 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

3439 

30720 

11.19 

CLB  Slices 

1720 

15360 

11.20 

Dffs  or  Latches 

2187 

31616 

6.92 

Block  RAMs 

5 

192 

2.60 

DSP48s 

4 

192 

2.08 

Table  4.8:  FPGA  Synthesize  Results  for  Compute  Derivative  Fy  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

106 

448 

23.66 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

3439 

30720 

11.19 

CLB  Slices 

1720 

15360 

11.20 

Dffs  or  Latches 

2187 

31616 

6.92 

Block  RAMs 

5 

192 

2.60 

DSP48s 

4 

192 

2.08 

The  constraint  for  Compute  Derivative  Ft  Module  in  Table  4.9  had  a  frequency 
of  101  MHz.  After  synthesis,  the  maximum  clock  speed/frequency  was  101.75  MHz. 
Therefore,  the  synthesis  had  no  timing  violations. 


Table  4.9:  FPGA  Synthesize  Results  for  Compute  Derivative  Ft  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

106 

448 

23.66 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

3368 

30720 

10.96 

CLB  Slices 

1684 

15360 

10.96 

Dffs  or  Latches 

2183 

31616 

6.90 

Block  RAMs 

5 

192 

2.60 

DSP48s 

4 

192 

2.08 

The  constraint  for  Matrix  Transpose  Module  in  Table  4.10  had  a  frequency  of 
158  MHz.  After  synthesis,  the  maximum  clock  speed/frequency  was  158.856  MHz. 
Therefore,  the  synthesis  had  no  timing  violations.  The  Matrix  Transpose  Module 
only  required  3.22%  CLB  slices  to  be  used. 
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Table  4.10:  FPGA  Synthesize  Results  for  Matrix  Transpose  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

68 

448 

15.18 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

990 

30720 

3.22 

CLB  Slices 

495 

15360 

3.22 

Dffs  or  Latches 

588 

31616 

1.86 

Block  RAMs 

2 

192 

1.04 

DSP48s 

2 

192 

1.04 

The  constraint  for  the  Floor  Module  had  a  frequency  of  471  MHz.  After  synthe¬ 
sis,  the  maximum  clock  speed/frequency  was  471.032  MHz.  Therefore,  the  synthesis 
had  no  timing  violations  and  the  results  can  be  seen  in  Table  4.11.  The  Floor  module 
only  uses  .01%  of  the  CLB  slices. 


Table  4.11:  FPGA  Synthesize  Results  for  Floor  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

9 

448 

2.01 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

4 

30720 

0.01 

CLB  Slices 

2 

15360 

0.01 

Dffs  or  Latches 

4 

31616 

0.01 

Block  RAMs 

0 

192 

0.00 

DSP48s 

0 

192 

0.00 

The  constraint  for  the  Round  Module  had  a  frequency  of  148  MHz.  After 
synthesis,  the  maximum  clock  speed/frequency  was  148.236  MHz.  Therefore,  the 
synthesis  had  no  timing  violations  and  the  results  can  be  seen  in  Table  4.12.  The 
Round  Module  uses  3.67%  of  the  CLB  slices. 

The  constraint  for  the  Pseudoinverse  Module  had  a  frequency  of  33  MHz. 

After  synthesis,  the  maximum  clock  speed/frequency  was  34.58  MHz.  Therefore,  the 
synthesis  had  no  timing  violations  and  the  results  can  be  seen  in  Table  4.13.  The 
Pseudoinverse  Module  uses  28.27%  of  the  CLB  slices. 

The  summary  of  maximum  clock  speed  can  be  seen  in  Table  4.14.  The  Precision 
RTL®software  tool  showed  similar  results  as  the  Cadence®software  tool.  The  Reduce 


48 


Table  4.12:  FPGA  Synthesize  Results  for  Round  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

68 

448 

15.18 

Global  Buffers 

1 

32 

3.13 

Function  Generators 

1128 

30720 

3.67 

CLB  Slices 

564 

15360 

3.67 

Dffs  or  Latches 

582 

31616 

1.84 

Block  RAMs 

1 

192 

0.52 

DSP48s 

2 

192 

1.04 

Table  4.13:  FPGA  Synthesize  Results  for  Pseudoinverse  Module 


Resources 

Used 

Available 

Utilization  % 

Inputs  &  Outputs 

339 

448 

75.67 

Global  Buffers 

2 

32 

6.25 

Function  Generators 

8698 

30720 

28.31 

CLB  Slices 

4349 

15360 

28.31 

Dffs  or  Latches 

4986 

31616 

15.77 

Block  RAMs 

14 

192 

7.29 

DSP48s 

22 

192 

11.46 

Table  4.14:  Summary  of  Maximum  Clock  Speed  for  Modules 


Module  Name 

Maximum  Clock  Speed  MHz 

Reduce  Matrix  Function  Top-Level  1  Behavior  Multiplier 

76.48 

Reduce  Matrix  Function  Top-Level  2  Behavior  Multiplier 

73.11 

Reduce  Matrix  Function  Top-Level  1  Booth  Multiplier 

101.75 

Reduce  Matrix  Function  Top-Level  2  Booth  Multiplier 

100.42 

Compute  Derivative  Fx 

105.87 

Compute  Derivative  Fy 

105.87 

Compute  Derivative  Ft 

101.75 

Matrix  Transpose 

158.85 

Floor 

471.03 

Round 

148.23 

Pseudoinverse 

34.58 

Matrix  Function  Booth  Multiplier  Design  outperforms  the  Behavior  Multiplier  by 
39.1%. 

The  summary  of  the  net  power  in  Watts  can  be  seen  in  Table  4.15.  The 
Precision  RTL®software  tool  showed  similar  results  as  the  Cadence®software  tool. 
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Table  4.15:  Summary  of  Net  Power  Watts  for  Modules 


Module  Name 

Net  Power  Watts 

Reduce  Matrix  Function  Top-Level  1  Behavior  Multiplier 

0.672 

Reduce  Matrix  Function  Top-Level  2  Behavior  Multiplier 

0.662 

Reduce  Matrix  Function  Top-Level  1  Booth  Multiplier 

0.703 

Reduce  Matrix  Function  Top-Level  2  Booth  Multiplier 

0.687 

Compute  Derivative  Fx 

0.700 

Compute  Derivative  Fy 

0.708 

Compute  Derivative  Ft 

0.700 

Matrix  Transpose 

0.668 

Floor 

0.636 

Round 

0.660 

Pseudoinverse 

0.646 

4-5  Chapter  Summary 

In  this  chapter  we  discussed  the  overview  of  the  designs  that  were  used  to  build 
the  MSP  synthesizable  reusable  libraries.  Top-Level  Design  schematics  were  used 
to  show  how  the  modules  connected  together.  Error  analysis  was  discussed  to  see 
what  error  was  introduced  by  using  the  Gaussian  Distribution  for  the  Mask  Matrix 
Modules.  These  modules  used  values  that  were  close  approximations  to  the  Matlab® 
values.  The  error  results  showed  only  a  maximum  of  .0059%  error  for  our  arbitrary 
data  set.  The  power,  area,  and  delay  section  used  the  Cadence®software  to  attain 
the  results  for  each  module.  It  was  noticed  that  as  the  delay  increased  the  power 
consumption  increased.  In  some  cases,  the  smallest  cell  area  occurred  when  the  delay 
was  at  maximum.  The  synthesis  section  used  the  Precision  RTL®software  to  target 
the  Xilinx®Virtex-4  SX  ML402  FPGA  to  attain  area  and  speed  for  the  different 
modules. 
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V.  Conclusions 


5.1  Summary  of  the  Project 

The  problem  was  to  convert  Matlab®  commands  and  functions  into  VHDL  com¬ 
ponents.  These  commands  are  used  in  image  processing  such  as  Optical  Flow.  To 
fully  implement  Lucas-Kanade  Optical  Flow  using  these  libraries  may  take  up  to  six 
months.  The  foundation  has  been  laid  through  the  synthesizable  reusable  libraries. 
The  development  of  these  MSP  modules  have  shortened  the  development  time  for  an 
FPGA  or  ASIC  design.  The  MSP  modules  were  implemented,  validated,  and  are  fully 
synthesizable  and  functional  reusable  libraries.  Chapter  2  discussed  the  VLSI  Design 
Process  and  how  long  it  takes  to  get  the  design  completed.  Architecture  techniques 
were  used  in  the  development  of  the  MSP  libraries.  The  risks  were  discussed  and 
developed  for  why  FPGAs  are  the  right  solution  for  getting  a  design  field  quickly. 
Chapter  3  discussed  the  methodology  and  design  structure  that  was  used  to  aid  in 
the  creation  of  the  MSP  libraries.  Chapter  4  discussed  the  analysis  and  results  that 
were  achieved  from  the  design  methodology.  These  libraries  make  it  possible  for  the 
designer  to  save  time,  money,  and  resources  in  the  VLSI  Design  Process.  The  mod¬ 
ules  that  were  created  are  reconhgurable  to  meet  the  needs  the  of  the  ever-changing 
Air  Force.  These  modules  can  be  used  for  future  Air  Force  projects  that  require 
change  detection,  computer  vision,  pattern  recognition,  tracking  targets,  and  image 
processing. 

5.2  Future  Work 

The  designs  of  the  MSP  Modules  were  shown  to  work  successfully  through  sim¬ 
ulations  and  were  synthesized  through  the  RTL  software  Cadence®and  Precision 
RTL®software  tools.  The  Cadence®software  focused  results  on  using  the  250-nm 
technology  library  from  the  Taiwan  Semiconductor  Manufacturing  Company.  The 
Precision  RTL®software  tools  targeted  designs  for  the  Xilinx®Virtex-4  SX  ML402 
FPGA.  This  section  will  discuss  three  future  research  topics. 
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5.2.1  Improve  Module  Designs.  The  modules  designed  are  just  the  begin¬ 
ning.  The  great  thing  about  this  project  is  that  it  provided  baseline  modules  that 
will  be  able  to  be  enhanced  for  many  projects  to  come.  For  example,  the  Reduce 
Matrix  Function  Module  could  have  pipelines  and  parallization  added  to  it  to  im¬ 
prove  the  delay  of  the  Module.  This  would  require  that  the  Conv2  Module  also  have 
additional  multipliers,  controllers,  pipelining,  and  parallization.  The  Memory  Mod¬ 
ule  could  have  enhanced  features  to  accept  multiple  reads  and  writes  during  the  same 
clock  cycle.  By  breaking  the  Control  Module  into  smaller  control  units  it  will  assist  in 
the  parallization  of  how  quickly  the  multiplications  can  be  achieved.  This  will  result 
in  a  faster  delay,  and  the  area  of  the  design  will  grow  larger  due  to  the  additional 
hardware.  These  are  just  a  few  examples;  the  possibilities  are  endless. 

5.2.2  Power,  Area,  and  Delay.  Optimized  power,  area,  and  delay  modules 
can  be  developed  for  each  Matlab®  command  or  function  that  makes  up  the  Lucas- 
Kanade  method.  These  additional  modules  will  give  the  designer  more  flexibility  and 
options  to  chose  from  for  their  next  design.  For  example,  their  arsenal  can  have  three 
different  versions  of  a  multiplier:  Multiplier  .Power  Module,  Multiplier_Area  Module, 
and  Multiplier_Delay  Module.  The  designer  will  have  the  option  to  mix  and  match 
these  modules  together  for  future  needs  of  the  Air  Force. 

5.2.3  Complex  Numbers.  Currently,  the  modules  are  based  on  using  fixed- 
point  notation.  Enhancements  and  variations  can  be  made  to  handle  floating  point 
and  complex  numbers.  A  floating  point  number  will  give  more  range  than  using 
fixed-point.  However,  complex  numbers  can  open  the  doors  for  potential  uses  such 
as  Digital  Signal  Process  (DSP),  image  processing,  tracking  targets,  etc.  When  using 
DSP  applications,  complex  numbers  are  used.  DSP  requires  complex  numbers  to  be 
able  to  work  in  the  time  and  frequency  domain  using  Laplace  transforms. 
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Appendix  A.  Reducing  Power  Consumption 


This  section  will  discuss  other  ways  to  reduce  power  consumption  for  designing 
an  architecture  that  requires  low  power. 


A.l  Clock  Speed/ Clock  Gating 

One  of  the  six  different  options  used  to  reduce  the  amount  of  power  in  a  circuit 
is  to  reduce  the  speed  of  the  clock.  Reducing  the  speed  of  the  clock  for  a  circuit  will 
reduce  the  switching  activity.  This  results  in  power  savings  by  limiting  the  amount 
of  switching  activity  that  has  to  take  place.  Besides  reducing  the  speed  of  the  clock 
for  the  circuit,  designers  have  also  proposed. 

Clock  gating  by  modifying  the  design  of  the  existing  energy  recovery 
clocked  flip-flops  to  incorporate  a  power  saving  feature  that  eliminates 
any  energy  loss  on  the  internal  clock  and  other  nodes  of  the  flip-flops. 
Applying  this  approach  of  clock  gating  for  a  system  that  has  1000  flip- 
flops  with  50%  percent  data  switching  eliminates  the  total  power  of  the 
circuit  by  47%  and  also  found  that  during  sleep  mode  the  flip-flops  reduce 
their  power  by  (lOOOx)  and  has  negligible  amounts  of  overhead  when  the 
flip-flops  are  in  active  mode  [13]. 

The  proposed  method  that  achieved  these  results  [13]  “inserted  the  gating  fea¬ 
ture  inside  the  flip-flops  themselves.”  This  design  can  be  seen  in  Figure  A.l.  Figure 
A. la  is  Single-Ended  Conditional  Capturing  Energy  Recovery  (SCCER),  Figure  A. lb 
is  Static  Differential  Energy  Recovery  (SDER),  and  Figure  A.lc  is  Differential  Con¬ 
ditional  Capturing  Energy  Recovery  (DCCER)  flip  flop. 

Another  option  to  reducing  the  power  of  the  clock  is  to  have  an  asynchronous 
circuit  design,  also  known  as  a  clockless  design  [2],  This  design  eliminates  the  use 
of  having  a  clock  saving  power.  Arsalan  and  Shams  state  the  following  about  asyn¬ 
chronous  designs  [2]: 

The  asynchronous  circuit  design  has  been  long  regarded  as  a  way  for  solv¬ 
ing  the  problem  of  synchronous  circuit  design  such  as  clock  skew,  worst 
case  delay,  and  heavy  global  clock  loading.  When  designed  carefully,  asyn¬ 
chronous  circuits  can  be  more  power  efficient  as  compared  to  their  syn¬ 
chronous  counterparts. 
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(a)  flock  (lathi"  SCCKR 


(b)  Clock  gating  SDKR 


V. 


(c)  Clock  gating  DCCKR 


Figure  A.l:  Energy  Recovery  Clocked  Flip-Flops  with  Clock 
Gating  [13] 

A. 2  Turning  Off  The  Circuit 

A  second  option  to  reduce  power  consumption  is  to  set  up  a  circuit  to  have 
processes  idle  or  turned  off  when  not  in  use.  This  approach  to  reducing  power  con¬ 
sumption  can  be  broken  into  three  parts:  timeout-based,  predictive,  and  stochastic  [4] . 
The  timeout-based  is  self  explanatory.  When  a  task  is  not  required  to  be  used  over 
a  given  time  period  it  will  timeout  until  it  is  required  again.  The  predictive  scheme 
can  use  an  algorithm  to  predict  when  the  circuit  should  turn  different  portions  on 
and  off,  depending  on  it’s  requirements.  The  predictive  scheme,  or  partially  shutting 
down  the  system  when  it  is  not  in  use,  will  reduce  the  power  consumption.  Partial 
shutdown  will  put  a  section  of  the  system  into  idle  or  sleep  mode  depending  on  the 
operation  that  is  being  performed.  If  a  process  is  set  into  idle  mode  it  will  wake  up 
without  waiting  to  reboot.  If  a  process  is  set  into  sleep  mode  it  will  be  turned  off  when 
not  in  use.  The  downsides  are  that  the  idle  mode  will  require  more  power  to  keep 
the  system  readily  available  and  the  sleep  mode  will  take  longer  for  the  section  of  the 
circuit  to  transition  to  run  mode.  The  third  approach  to  reducing  power  consumption 
is  stochastic.  This  “approach  formulates  policy  optimization  of  Dynamic  Power  Man¬ 
agement  (DPM)  as  an  optimization  problem  under  uncertainty  rather  than  trying  to 
eliminate  uncertainty  by  prediction”  [4], 
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A. 3  Recycle  Power 

Thirdly,  recycling  the  power  in  the  circuit  is  another  possible  solution  to  reduc¬ 
ing  power  consumption.  The  switching  activity  for  a  circuit  is  the  major  cause  for 
power  consumption  by  having  the  circuit  constantly  charging  and  discharging  across 
the  load  and  parasitic  capacitance  [16].  If  you  were  to  recycle  the  power  of  the  circuit 
back  into  it’s  voltage  supply  you  could  save  up  to  80%  power  consumption  [16],  in¬ 
stead  of  sending  the  power  discharging  back  into  ground  where  it  will  be  lost  forever. 
Most  circuits  use  Direct  Current  (DC)  power  supplies  to  turn  them  on.  If  you  use 
Alternating  Current  (AC)  power  supplies  instead  you  can  recycle  the  power  for  your 
circuit.  This  approach  can  be  seen  in  Figure  A. 2  [16]  using  two  diodes  and  P  channel 
and  N  channel  transistors  to  keep  the  load  from  discharging.  As  the  power  source 
supplies  a  sinusoidal  waveform  the  signal  is  0  volts  and  the  P  channel  transistor  is 
turned  on  and  begins  charging  the  load  capacitor.  This  causes  the  diode  connected  to 
the  P  channel  to  be  just  below  the  Peak  Voltage  (Vp)  by  one  diode.  During  all  this, 
the  N  channel  transistor  is  turned  off.  The  diode  that  is  connected  to  the  P  channel 
transistor  holds  the  load  voltage  to  Vp.  During  the  Vp  of  the  sinusoidal  waveform 
the  N  channel  transistor  is  turned  on  and  the  P  channel  transistor  is  turned  off.  The 
P  and  N  channel  transistors  are  turned  off  during  the  negative  cycle  of  the  sinusoidal 
signal  [16]. 
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A. 4  Redesign  Logic  Of  The  Circuit 

Another  option  is  to  reduce  the  amount  of  floating  point  operations  in  the  circuit 
by  replacing  them  with  fixpoint  representation.  It’s  even  possible  to  have  a  trade  off 
between  parallel  and  serial  logic  design  for  the  circuit.  Power  can  be  saved  by  using 
parallization,  pipelining,  or  a  systolic  architecture  approach.  Parallization  gives  you 
the  option  to  be  able  to  slow  down  the  clock  speed  to  still  compute  data  through  the 
circuit.  The  slowing  down  of  the  clock  speed  reduces  the  clock  switching  resulting  in 
power  savings.  For  example,  a  sequential  design  requires  a  clock  that  will  result  in 
continuous  switching  activity  that  will  cause  the  power  consumption  to  increase  over 
a  design  that  are  combinational.  In  a  sequential  circuit,  adders  and  multipliers  are  a 
good  example  of  how  additional  switching  activity  will  increase  power  consumption. 
Reducing  the  power  consumption  for  a  sequential  circuit  could  reduce  the  clock  speed 
of  the  circuit. 

A. 5  Higher  Level  of  Integration 

Changing  the  way  the  circuit  is  integrated  can  help  reduce  power.  Performing 
a  System  on  Chip  approach,  non-cmos,  or  even  reducing  the  larnda  size  of  the  circuit 
from  250  nm  to  90  nm,  for  example,  can  reduce  the  power.  Reducing  the  technology 
size  of  design  reduces  the  Threshold  Voltage  (Vth)  required  to  turn  on  the  transistors. 
As  seen  in  the  Equation  A.l,  reducing  the  voltage  or  current  will  cause  the  power 
usage  to  go  down.  P  represents  power  (P),  V  represents  voltage  (V),  and  I  stands  for 
Current  (I).  Lowering  the  Vth  decreases  power,  resulting  in  a  power  savings. 

P  =  VI  (A.l) 

A. 6  Dynamic  Power  Management 

Finally,  dynamic  power  management  is  another  approach  to  reduce  power  con¬ 
sumption  in  a  circuit  design.  It  has  been  stated  [14]  that  “Dynamic  power  man¬ 
agement  is  one  of  the  most  popular  and  successful  low  power  design  techniques  in 


56 


Figure  A. 3:  Illustration  of  Clock  Gating 

commercial  integrated  circuits”.  This  approach  to  saving  power  is  built  around  the 
idea  of  conserving  power  in  the  circuit  for  activities  that  don’t  need  to  be  operating. 
The  approach  is  also  easy  to  build  into  a  circuit  design  or  even  older  design  struc¬ 
tures.  One  of  the  best  features  of  using  DPM  is  that  in  most  cases  the  area  of  the 
system  will  not  increase  much  in  size  and  the  performance  will  not  be  reduced.  The 
mechanisms  for  DPM  are  a  collection  of  techniques  using  the  following  areas  in  the 
circuit  to  reduce  the  power  consumption:  Clock  Gating,  Qualified  System  Latches, 
Guarded  Evaluation,  Bus  Deactivation,  and  Self-timed  Techniques.  Now  lets  take  a 
look  at  each  one  of  these  power  saving  strategies  in  further  detail. 

A. 6.1  Clock  Gating.  Clock  Gating  was  touched  on  early  but  it’s  still  impor¬ 
tant  to  note  that  when  dealing  with  circuits  or  microprocessors  that  high  performance 
is  typically  trying  to  be  achieved.  Performance  in  a  design  requires  that  the  clock 
of  the  system  operate  as  quickly  as  possible.  This  in  turn  causes  the  most  power 
consumption  for  the  system  [14]  by  the  amount  of  switching  activity  that  is  taking 
place.  Also,  the  clock  tree  for  a  system  puts  an  additional  load  on  the  system  caus¬ 
ing  increased  power  consumption.  The  idea  is  to  eliminate  unnecessary  activity  on 
segments  of  the  clock  signal  by  gating  these  with  special  qualifying  signals.  This  can 
been  seen  in  Figure  A. 3. 
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Data  Register  Data  Register 


Data  Register  Data  Register 


Figure  A. 4:  Guarded  evaluation  for  a  dual  function  ALU:  (a) 

Original  design,  (b)  Guarded  design 

A. 6. 2  Qualified  System  Latches.  According  to  Tiwari  [14]  “regular  system 
latches  have  a  data  input  and  a  control  input  that  is  usually  connected  to  the  system 
clock.  A  qualified  latch  has  an  additional  input,  the  “enable”  input,  which  determines 
if  the  inputs  to  the  latch  will  be  read  or  not  when  the  clock  is  asserted.”  The  use  of 
these  latches  can  have  a  better  power  savings  than  regular  latches.  The  reasoning  be¬ 
hind  this  is  that  the  regular  latches  use  power  every  time  the  system  clock  is  switching 
low  to  high,  even  when  the  data  is  not  changing  across  the  latch. 


A. 6.3  Guarded  Evaluation.  From  the  previous  two  subsections  you  saw 
how  the  clock  on  the  system  requires  large  amounts  of  power  to  operate.  Guarded 
evaluation  comes  into  play  when  you  are  unable  to  slow  the  clock  or  can  not  make  any 
further  corrections  to  reduce  the  amount  of  power  the  clock  is  drawing.  For  instance, 
[14]  “if  the  combinational  block  at  the  output  of  a  system  latch  is  known  to  be  doing 
no  useful  work,  power  can  be  canned  by  disabling  transitions  from  entering  the  block.” 
It  can  also  be  beneficial  if  you  have  a  set  of  [14]  “modules  that  share  a  set  of  inputs” . 
Being  able  to  turn  on  only  the  module  that  is  used  will  save  power  from  going  to  a 
module  that  is  not  being  used.  This  is  seen  in  Figure  A. 4. 
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A. 6. 4  Bus  Deactivation.  According  to  Tiwari  [14]  “the  idea  is  to  restrict  the 
enabling  condition  on  tristate  buffers  driving  a  bus,  such  that  the  bus  is  not  driven 
on  cycles  when  it  is  known  that  its  results  will  not  be  used.  Such  unnecessary  activity 
on  heavily  loaded  buses  can  lead  to  significant  power  wastage.” 

A. 6. 5  Self-timed,  Techniques.  Self-timed  techniques  are  useful  with  on-chip 
memories  to  reduce  power.  When  dealing  with  microprocessors  the  memory  can  cause 
large  power  consumption  as  data  is  being  transferred  back  and  forth  to  the  cache  with 
on-chip  memories  systems.  “A  low-power  optimization  is  to  pre-charge  the  bit  lines 
and  activate  the  sense-amps  using  short  pulses.  These  pulses  are  generated  only 
when  changes  are  detected  on  the  address  lines  and  a  read  or  write  is  going  to  be 
initiated  [14].” 

A. 7  Synthesize 

The  synthesis  approach  in  this  research  will  be  using  Cadence  Software  CAD 
tools.  The  paper  [11]  “Leakage  Power  Optimization  Flow”,  written  by  Sirsi,  discusses 
three  different  Power  Flow  models.  The  Power  Flow  model  that  worked  the  best  for 
reducing  power  consumption  for  clock  speed  of  200Mhz,  technology  130  nm,  and  cell 
instances  120K  was  his  Power  Flow  3  which  uses  a  RTL  Compiler  and  System  on 
Chip  Encounter.  Siri’s  power  flow  model  is  illustrated  in  Figure  A. 5.  In  general,  the 
flow  taken  from  Cadence’s  support  documentation  has  the  block  diagrams  broken  into 
simpler  blocks.  The  most  important  thing  for  a  good  design  is  the  design  flow  used  for 
the  system.  Writing  a  good  script  to  address  these  challenges  plays  a  large  role  with 
using  the  Cadence  Tools.  The  first  step  is  to  set  up  your  environment  settings  along 
with  what  technology  library  that  will  be  used.  Load  the  VHDL  hie  and  perform 
elaboration  on  the  design.  Apply  constraints  such  as  timing,  design  rule  constraints, 
defining  input  and  output  delay,  and  etc.  Once  these  have  been  set  it’s  a  good  idea 
to  add  optimization  setting  to  the  design.  Now  you’re  ready  for  the  final  steps  of  the 
script  to  synthesize,  perform  analysis,  and  export  the  design  as  a  RTL  hie. 
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RTL  Compiler 
SoC  Encounter 


Figure  A. 5:  Leakage  Power  Optimization  Flow 
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Appendix  B.  Matlab®  Code 


his  section  has  the  Matlab®  code  that  was  written  by  Sohai  Khan  [8]  for  Optical 


Flow  using  the  Lucas-Kanade  method. 


B.  1  HierarchicalLK 

Listing  B.l:  The  HierarchicalLK. m  Matlab®  file.(appendix2/HierarchicalLK.m) 

function  [u,v,cert]  =  HierarchicalLK ( iml ,  im2 ,  numLevels ,  ... 

windowSize ,  iterations,  display) 

°/0  HIERARCHICALLK  Hierarchical  Lucas  Kanade 

/  (using  pyramids) 

7«  [u  ,  v]  =HierarchicalLK  (  iml  ,  im2  ,  numLevels,  windowSize,  iterations... 
,  display) 

7,  Tested  for  pyramids  of  height  1,  2,  3  only... 

7,  operation  with  pyramids  of  height  4  might  be  unreliable 

7. 

7.  Use  quiver  (u,  -v ,  0)  to  view  the  results 

% 

7.  NUMLEVELS  Pyramid  Levels  (typical  value  3) 

7.  WINDOWSIZE  Size  of  smoothing  window  (typical  value  1-4) 

7.  ITERATIONS  number  of  iterations  (typical  value  1-5) 

7.  DISPLAY  1  to  display  flow  fields  (1  or  0) 

7. 

7.Uses  :  Reduce  ,  Expand 

7. 

7.  Sohaib  Khan 

/  edited  05-15-03  (Yaser) 

7.  yaser@cs.ucf.edu 

7. 

7.  [1]  B.D.  Lucas  and  T.  Kanade,  "An  Iterative  Image  Registration 

7.  technique,  with  an  Application  to  Stero  Vision,"  Int  1 1  Joint 
7.  Conference  Artifical  Intelligence,  pp  .  121-130,  1981. 

if  ( size ( iml  ,  1 )  ~= size ( im2  ,  1 )  )  I  ( s ize ( iml  ,  2)  ~= s ize ( im2  ,  2)  ) 
error ( 1  images  are  not  same  size’); 

end  ; 

if  (size (iml  ,  3)  ~=  1)  I  (size(im2,  3)  ”=  1) 

error (’ input  should  be  gray  level  images’); 

end  ; 


7.  check  image  sizes  and  crop  if  not  divisible 
if  ( rem ( s ize ( iml  ,  1 )  ,  2" (numLevels  -  1))  ~=  0) 

warning  (’  image  will  be  cropped  in  height,  size  of  output  will  ... 
be  smaller  than  input ! ’ ) ; 

iml  =  iml  (  1  :  (  s  ize  (  iml  ,  1 )  -  rem  (  s  ize  (  iml  ,  1 )  ,  2~(numLevels  -  1))... 


im2  =  im2 ( 1 :( size ( iml  ,  1 )  -  rem ( s ize ( iml  ,  1 )  ,  2~(numLevels  -  1))... 
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end  ; 


if  (rem ( size ( iml  ,  2)  ,  2" (numLevels  -  1))  "=  0) 

warning  (’  image  will  be  cropped  in  width.,  size  of  output  will  . 
be  smaller  than  input !  ’  )  ; 

iml  =  iml(:,  l:(size(iml,2)  -  rem(size(iml,2),  2~( numLevels  - 

1))))  ; 

im2  =  im2 ( : ,  l:(size(iml,2)  -  rem(size(iml,2),  2” ( numLevels  - 

1))))  ; 

end  ; 

'/.Build  Pyramids 
pyramidl  =  iml ; 
pyramid2  =  im2 ; 

for  i=2 : numLevels 

iml  =  reduce (iml)  ; 
im2  =  reduce (im2) ; 

pyr  amid  1  (  1  :  size  (  iml  ,  1 )  ,  1  :  size  (  iml  ,  2)  ,  i)  =  iml; 
pyramid2 ( 1 :  size ( im2  ,  1 )  ,  1 : size ( im2  ,  2)  ,  i)  =  im2  ; 

end  ; 

7,  base  level  computation 
di sp ( ’ Comput ing  Level  1’); 

baselml  =  pyr amid  1 ( 1 :( s ize ( pyr amidl  , 1 )/ (2 ~ ( numLevels - 1 )))  ,  l:(size 
(pyramidl  ,  2 )  /  (  2  ~  ( numLevels - 1 ) ) )  ,  numLevels )  ; 
baselm2  =  pyr amid2 ( 1 :( s ize ( pyr amid2 , 1 )/ (2 ~ ( numLevels - 1 ))) ,  l:(size 
(pyramid2  ,2) / ( 2  ~ ( numLevels - 1 ) ) )  ,  numLevels )  ; 

[u,v]  =  LucasKanade (baselml ,  baselm2,  windowSize); 

for  r  =  1: iterations 

[u ,  v]  =  LucasKanadeRef ined (u ,  v,  baselml,  baselm2); 

end 

7, propagating  flow  2  higher  levels 
for  i  =  2:numLevels 

di sp ([ ’ Comput ing  Level  ’,  num2str ( i ) ] ) ; 

uEx  =  2  *  imresize  (u ,  size  (u)  *2)  ;  70  use  appropriate  expand  .. 

function  (gaussian,  bilinear,  cubic,  etc). 
vEx  =  2  *  imresize  (v  ,  size  (v)  *2)  ; 

curlml  =  pyramidl(l:(size(pyramidl,l)/(2~( numLevels  -  i ) ) ) , 

1 : (size (pyr amidl , 2) / (2~ ( numLevels  -  i ) ) ) ,  ( numLevels  -  i ) . . 

+  1)  ; 

curlm2  =  pyr amid2 ( 1 : ( s ize ( pyramid2 , 1 ) / ( 2 ~ ( numLe vel s  -  i))),  .. 

1 :( size (pyramid2 , 2) / (2~ ( numLevels  -  i))),  (numLevels  -  i).. 

+  1)  ; 

[u ,  v]  =  LucasKanadeRef ined (uEx ,  vEx  ,  curlml,  curlm2); 
for  r  =  1: iterations 

[u ,  v,  cert]  =  LucasKanadeRef ined (u ,  v,  curlml,  curlm2); 
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end 
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end  ; 

if  (display==l) 

figure  ,  quiver(reduce((reduce(medfilt2(flipud(u)  ,  [5 
reduce((reduce(medfilt2(flipud(v),[5  5])))),  0), 
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end 


5]  )  )))  , 
axis  equal 


B.  2  Reduce 


Listing  B.2:  The  Reduce. m  Matlab®  file.(appendix2/Reduce.m) 


1 

function  smalllm  =  Reduce (im) 

2 

9 

7,  REDUCE  Compute  smaller  layer  of  Gaussian  Pyramid 

4 

/  Sohaib  Khan,  Feb  16,  2000 

o 

6 

7.  Algo 

7 

7«Gaussian  mask  =  [0.05  0.25  0.4  0.25  0.05] 

8 

7.  Apply  Id  mask  to  alternate  pixels  along  each  row  of  image 

9 

7.  apply  Id  mask  to  each  pixel  along  alternate  columns  of 

10 

7.  resulting  image 

11 

12 

13 

mask  =  [0.05  0.25  0.4  0.25  0.05]; 

14 

15 

hResult  =  conv2(im,  mask); 

16 

hResult  =  hResult  (:  ,3: size(hResult  ,2) -2)  ; 

17 

hResult  =  hResult(:,  1 : 2 : size (hResult , 2) ) ; 

18 

19 

vResult  =  conv2 (hResult ,  mask’); 

20 

vResult  =  vResult (3 : size (vResult , 1) -2 ,  :); 

21 

vResult  =  vResult  (1:2:  size(vResult,l),:); 

22 

23 

smalllm  =  vResult; 

B.3  LucasKanade 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


Listing  B.3:  The  LucasKanade. m  Matlab®  file.(appendix2/LucasKanade.m) 

function  [u,  v]  =  LucasKanade ( iml ,  im2 ,  windowSize); 

/  LucasKanade  lucas  kanade  algorithm ,  without  pyramids 
7.  ( only  1  level ) ; 

7,  REVISION:  NaN  vals  are  replaced  by  zeros 
[fx,  fy ,  ft]  =  Comput eDer i vat i ve s ( iml  ,  im2); 


zeros (size (iml) ) ; 
zeros (size (im2) ) ; 


v 
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halfWindow  =  f loor ( windowSize /2 )  ; 

for  i  =  half Window+1 : size (fx , 1) -half Window 

for  j  =  half Window+1 : size (fx , 2) -half Window 

curFx  =  fx (i-half Window : i  +  half Window ,  j -half Window : j +  ..  . 
halfWindow ) ; 

curFy  =  fy (i-half Window : i  +  half Window ,  j -half Window : j +  ..  . 
halfWindow ) ; 

curFt  =  ft (i-halfWindow : i  +  halfWindow ,  j -half Window : j  +  ..  . 
halfWindow) ; 

curFx  =  curFx  ’  ; 

curFy  =  curFy  ’  ; 

curFt  =  curFt  ’  ; 


curFx  =  curFx  (  :  )  ; 
curFy  =  curFy (:) ; 
curFt  =  -curFt (:); 

A  =  [curFx  curFy]; 

U  =  pinv (A  1  *  A) *A ’ * curFt ; 

u (i , j ) =U  (  1)  ; 
v (i , j ) =U  (2)  ; 
end  ; 
end  ; 


u ( isnan (u) ) =0 ; 
v ( isnan (v) ) =0 ; 


70u  =  u(2:size(u,l)  ,  2:size(u,2)); 

7«v  =  v(2:  size  (v  ,  1)  ,  2:size(v,2))  ; 


7. 7. 7. 7. 7. 7. 7. 7. 7, 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 1 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 


function  [fx ,  fy,  ft]  =  Comput eDer i vat i ves ( iml ,  im2); 

7« Comput eDer i vat i ve s  Compute  horizontal,  vertical  and  time 
7«derivative  between  two  gray-level  images. 


if  ( size ( iml  ,  1 )  ~=  size(im2,l))  I  (size(iml,2)  ~=  size(im2,2)) 

error (’input  images  are  not  the  same  size’); 
end  ; 

if  ( size ( iml  , 3) ~  =  1)  I  ( size ( im2 , 3) ~=1 ) 

err or ( 1  method  only  works  for  gray-level  images’); 
end  ; 


fx  =  conv2 ( iml  , 0 . 25  *  [-1  1;  -1  1])  +  conv2(im2,  0.25*[-l  1;  -1  1] )  .  .  . 
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fy  =  conv2(iml,  0.25*[-l  -1;  1  1])  +  conv2(im2,  0.25*[-l  -1;  1  1]  )  .  .  . 

i 

ft  =  conv2(iml,  0 . 25* ones (2) )  +  conv2(im2,  -0 . 25* ones (2) ) ; 

7,  make  same  size  as  input 
fx  =  fx(l: size (fx  ,1)  -1  ,  1:  size (fx ,2) -1)  ; 

f y =f y ( 1 : size(fy,l) -1,  1: size(fy,2) -1) ; 

ft  =  ft (1: s ize  ( f t  ,1)  -1  ,  1: s ize  ( f t  ,2) -1)  ; 
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B.4  LucasKanadeRefined 

Listing  B.4:  The  LucasKanade Refined. m  Matlab®  file. 

(appendix2/LncasKanadeRefined.m) 

function  [u,v, cert]  =  LucasKanadeRefined (uln ,  vln ,  iml ,  im2); 

7,  Lucas  Kanade  Refined  computes  lucas  kanade  flow  at  the  current 
7.  level  given  previous  estimates  current  implement  at  ion  is  only 
7.  for  a  3x3  window 

7«[fx,  fy  ,  ft]  =  Comput  eDer  i  vat  i  ve  s  (  iml  ,  im2); 

uln  =  round(uln); 
vln  =  round(vln); 

7.  uln  =  uln(2:size(uln,l),  2:size(uln,  2 )  - 1 )  ; 

7oVln  =  vln(2:size(vln,l),  2:size(vln,  2 )  - 1 )  ; 

u  =  zeros ( size ( iml )) ; 
v  =  zer  os  (  s  ize  (  im2  )  )  ; 

7.  to  compute  derivatives  ,  use  a  5x5  block.  .  . 

7.  the  resulting  derivative  will  be  5x5.  .  . 

7.  take  the  middle  3x3  block  as  derivative 
for  i  =  3 : s ize ( iml  ,  1 ) -2 

for  j  =  3 : size ( im2 , 2) -2 
7,  if  uln(i,j)~  =  0 
7.  disp  (  ’  ha  ’  )  ; 

7,  end ; 

curlml  =  iml ( i -2 : i +2 ,  j-2:j+2); 

lowRindex  =  i -2+ vln ( i  ,  j  )  ; 
highRindex  =  i +  2+ vln ( i  ,  j )  ; 
lowCindex  =  j -2+uIn ( i , j ) ; 
highCindex  =  j +2+uIn ( i , j ) ; 

if  (lowRindex  <  1) 
lowRindex  =  1 ; 
highRindex  =  5; 
end  ; 
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if  ( highRindex  >  s ize ( iml  ,  1 ) ) 
lowRindex  =  size ( iml , 1 ) -4 ; 
highRindex  =  size(iml,l); 
end  ; 

if  (lowCindex  <  1) 
lowCindex  =  1 ; 
highCindex  =  5; 
end  ; 

if  (highCindex  >  size ( iml  ,  2) ) 
lowCindex  =  size ( iml , 2 ) -4 ; 
highCindex  =  size ( iml  ,  2)  ; 
end  ; 

if  isnan ( lowRindex ) 
lowRindex  =  i-2; 
highRindex  =  i+2; 
end  ; 

if  isnan ( lowCindex ) 
lowCindex  =  j -2 ; 
highCindex  =  j+2; 
end  ; 


curlm2  = 

im2 (lowRindex : highRindex  ,  lowCindex : highCindex) 

[curFx  , 

curFy,  curFt ]= Comput eDer i vat ive s ( cur Iml  ,  curlm2) 

curFx  = 

curFx  (2:4 

,  2:4); 

curFy  = 

curFy  (2:4 

,  2:4); 

curFt  = 

curFt  (2:4 

,  2:4); 

curFx  = 

curFx  ’  ; 

curFy  = 

curFy  ’  ; 

curFt  = 

curFt  ’  ; 

curFx  = 

curFx  (  :  )  ; 

curFy  = 

curFy  (  :  )  ; 

curFt  = 

- curF t  (  :  ) 

i 

A  =  [curFx  curFy] 

i 

U  =  pinv (A ’ *  A) *A  ’ 

*  curFt  ; 

u (i , j ) =U ( 1)  ; 
v  (i , j ) =U (2)  ; 


cert(i,j)  =  rcond(A’*A); 
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end  ; 
end  ; 

u  =  u+uln ; 
v  =  v+ vln ; 


7. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 7. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 7. 7. 7. 7. 7. 


function  [fx ,  fy,  ft]  =  Comput eDer i vat i ves ( iml ,  im2); 

7. Comput eDer i vat i ve s  Compute  horizontal,  vertical  and  time 
7oderivative  between  two  gray-level  images. 


if  (size(iml,l)  ~=  s ize ( im2 , 1 ) )  I  (size(iml,2)  ~=  size(im2,2)) 
error (’input  images  are  not  the  same  size’); 
end  ; 


if  ( size ( iml  , 3)  "  =  1 )  I  ( size ( im2 , 3) ~=1 ) 

err or ( 1  method  only  works  for  gray-level  images’); 
end  ; 


fx  =  conv2 ( iml  , 0 . 25*  [-1  1;  -1  1])  +  conv2(im2,  0.25*[-l  1;  -1  1] )  .  .  . 

i 

fy  =  conv2(iml,  0.25*[-l  -1;  1  1])  +  conv2(im2,  0.25*[-l  -1;  1  1]  )  .  .  . 

i 

ft  =  conv2(iml,  0 . 25* ones (2) )  +  conv2(im2,  -0 . 25* ones (2) ) ; 

7.  make  same  size  as  input 
f  x  =  f x  (  1 :  size  (fx  ,1)  -1  ,  1:  size  (fx  ,2)  -1)  ; 

f y =f y ( 1 : size(fy,l) -1,  1: size(fy,2) -1) ; 

ft  =  ft (1: size  (ft  ,1)  -1  ,  1: size  (ft  ,2) -1)  ; 
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B.  5  Expand 

Listing  B.5:  The  Expand. m  Matlab®  file.(appendix2/Expand.m) 

function  largelm  =  Expand(im); 

7.EXPAND  Compute  large  layer  of  Gaussian  pyramid 
7.  Sohaib  Khan,  Feb  16,  2000 
7.  Algo 

7.  Gaussian  mask  =  [0.05  0.25  0.4  0.25  0.05] 

7.  Insert  zeros  in  every  alternate  row  position  and  conv  with  mask 
7.  insert  zeros  in  every  alternate  clmn  position  in  result 
7.  and  conv  with  mask  ’ 

mask  =  2*  [0 . 05  0.25  0.4  0.25  0.05]; 

7.  factor  of  2  is  there  because 
7.  each  pixel  gets  contribution  either 
/  from  0.05,  0.4,  0.05  or  from  0.25,  0.25 


67 


16 

17 

18 

19 

20 

21 

22 

23 

24 

25 

26 

27 

28 

29 

30 

31 

32 


7,  insert  zeros  in  every  alternate  position  in  each  row 
rowZeros  =  [im;  zeros ( size ( im) )] ; 

rowZeros  =  reshape (rowZeros  ,  size(im,l),  2* s ize ( im  ,  2 ) )  ; 

70conv  with  horiz  mask 

newlm  =  conv2 ( rowZeros ,  mask); 

newlm  =  newIm(:,3:size(newIm,2)-2); 

7,  insert  zeros  in  every  alternate  position  in  each  col 
colZeros  =  newlm ’ ; 

colZeros  =  [colZeros;  zeros ( size ( colZeros ))]  ; 

colZeros  =  reshape ( colZeros  ,  size ( colZeros  ,  1) /2  ,  2* size ( colZeros  .. . 

,2))  ; 

colZeros  =  colZeros  ’  ; 

large Im= conv2 ( colZeros  ,  mask1); 
largelm=largelm(3: size(largelm  ,1)  -2,  :)  ; 


Appendix  C.  VHDL  Code 


This  section  has  the  Top-Level  Designs  that  were  coded  in  VHDL  to  convert  the 
Matlab®  commands  and  functions. 


C.l  Reduce  Matrix  Function  Top-Level  1  Behavior  Multiplier 

Listing  C.l:  The  ReduceConv2Module.vhd  VHDL  hie. 

(appendix3/ReduceConv2Module.vhd) 


1 

--  Capt .  Jason  Shirley 

2 

--  This  is  the  Reduce  Matrix  Function 

Top-Level  Design  1  using  the 

3 

--  Behavior  Multiplier.  These  Modules 

are  structually  connected. 

4 

5 

6 

library  ieee  ; 

7 

use  ieee . std_logic_ 1 164 .  all  ; 

8 

use  ieee . std_logic_unsigned . all ; 

9 

use  ieee . std_logic_ar ith . all ; 

10 

use  ieee . numeric.std . all ; 

11 

12 

13 

14 

entity  reduceconv2_module  is 

15 

16 

generic  ( address_width :  integer  := 

8; 

17 

data_width:  integer  :=  24); 

18 

19 

port  (elk 

in  std_logic ; 

20 

reset 

in  std_logic ; 

21 

enable.matrixA 

in  std_logic ; 

22 

enable_conv2_module 

in  std_logic ; 

23 

read_address_matrixA 

in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

24 

read_address_matrixReduce 

in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

25 

input_matrixA_values 

in  std_logic_vector  (... 

data_width  -  1  downto  0) ; 

26 

output _data_matrixC 

out  std_logic_vector  (  . .  . 

data_width  -  1  downto  0) ; 

27 

reduce_module_f ini  shed 

out  std_logic ; 

28 

input_to_matrixC_hresult 

out  std_logic_vector ( . .  . 

data_width  -  1  downto  0) ; 

29 

writeEnable_matrixC_hresult 

out  std_logic ; 

30 

address_matrixA_hresult 

out  std_logic_vector  (  . .  . 

address_width  -  1  downto  0) ; 

31 

address_matrixB_hresult 

out  std_logic_vector(. .  . 

address_width  -  1  downto  0) ; 

32 

address_matrixC_hresult 

out  std_logic_vector ( . .  . 

address_width  -  1  downto  0) ; 

33 

matrixA_output_hresult 

out  std_logic_vector ( . .  . 

data_width  -  1  downto  0) ; 
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matrixB_output_h.resu.lt  : 

data_width  -  1  downto  0) ; 
control_adder_signal_hresult : 
input _to_matrixC_vre suit 

data_width  -  1  downto  0) ; 
wr iteEnable_matr ixC_vresult  : 
address_matrixA_vresult  : 

address_width  -  1  downto  0) 
address_matrixB_vresult  : 

address_width  -  1  downto  0) 
address_matrixC_vresult  : 

address_width  -  1  downto  0) 
matr ixA_output_vresult  : 

data_width  -  1  downto  0) ; 
matrixB_output_vresult  : 

data_width  -  1  downto  0) ; 
contr ol_adder _s ignal_vr esult : 

)  ; 


out  std_logic_vector ( . .  . 

out  std_logic ; 

:  out  std_logic_vector ( . . 

out  std_logic ; 

out  std_logic_vector  (  . .  . 

i 

out  std_logic_vector ( . .  . 

i 

out  std_logic_vector ( . .  . 

i 

out  std_logic_vector  (  . .  . 
out  std_logic_vector  (  . .  . 
out  std_logic 


end  reduceconv2_module  ; 


architecture  structure_reduceconv2_module  of  reduceconv2_module 

component  hresultconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


(elk 

in 

std_logic ; 

reset 

in 

std_logic ; 

enable.matrixA 

in 

std_logic ; 

enable_conv2_module 

in 

std_logic ; 

read_address_matrixA 

in 

std_logic_vector  (... 

address_width  -  1 

downt  o 

0)  ; 

read_address_matrixC 

in 

std_logic_vector  (... 

address_width  -  1 

downt  o 

0)  ; 

input_matrixA_values 

in 

std_logic_vector  (data_width 

1  downto  0)  ; 

output_data_matrixC 

out 

std_logic_vector (data_width 

1  downto  0)  ; 

enable_reduce 

out 

std_logic ; 

input_to_matrixC 

out 

std_logic_vector (data_width 

1  downto  0)  ; 

writeEnable.matrixC 

out 

std_logic ; 

address.matrixA 

out 

std_logic_vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

address.matrixB 

out 

std_logic_vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

address.matrixC 

out 

std_logic_vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

is 
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matrixA_output 

1  downto  0)  ; 
matrixB_output 

1  downto  0)  ; 
control_adder_signal 
) ; 


out  std_logic_vector ( data_width 
out  std_logic_vector ( data_width 
out  std_logic 


end  component  hresultconv2_module ; 


component  hresultreducecontrol  is 


generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 
n2 :  integer  :=  11); 

--  address_widtli  must  be  an  even  number 

--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 
--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 


port 


(elk  : 

reset  : 

enable_control  : 

wr iteEnable_matr ix  : 

enable_vresult  : 

read_address_hResult  : 

address_width  -  1  downto 
wr ite_address_reducehResult : 
address_width  -  1  downto 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic_vector 
0)  ; 

out  std_logic_vector 
0) 


)  ; 


(.  .  . 
(.  .  . 


end  component  hresultreducecontrol; 


component  vresultconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 

port  (elk  :  in  std_logic; 
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119 

reset 

:  in  std_logic ; 

120 

enable.matrixA 

:  in  std_logic ; 

121 

enable_conv2_module 

:  in  std_logic ; 

122 

read_address_matrixA 

:  in  std_logic_vector  (... 

address_width  -  1 

downt  o 

0)  ; 

123 

read_address_matrixC 

:  in  std_logic_vector  (... 

address_width  -  1 

downt  o 

0)  ; 

124 

input.matrixA.values 

:  in  std_logic_vector  (data. 

width 

1  downto  0)  ; 

125 

output  _data_matrixC 

:  out 

std.logic.vector (data. 

width 

1  downto  0)  ; 

126 

enable.reduce 

:  out 

std_logic ; 

127 

input_to_matrixC 

:  out 

std.logic.vector (data. 

width 

1  downto  0)  ; 

128 

writeEnable_matrixC 

:  out 

std.logic ; 

129 

address.matrixA 

:  out 

std.logic.vector  (  . .  . 

address.width  -  1 

downt  o 

0)  ; 

130 

address.matrixB 

:  out 

std.logic.vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

131 

address.matrixC 

:  out 

std.logic.vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

132 

matrixA_output 

:  out 

std.logic.vector (data. 

width 

1  downto  0)  ; 

133 

matrixB.output 

:  out 

std.logic.vector (data. 

width 

1  downto  0)  ; 

134 

control_adder_signal 

:  out 

std.logic 

135 

) ; 

136 

137 

end  component  vresultconv2_module  ; 

138 

139 

140 

component  vresultreducecontrol 

is 

141 

142 

generic  (address.width 

:  integer  : =  8 ; 

143 

nl :  integer  : 

=  9; 

144 

n2 :  integer  : 

=  3)  ; 

145 

--  address.width  must  be  an  even  number 

146 

port  (elk 

in  std.logic ; 

147 

reset 

in  std.logic ; 

148 

enable.control 

in  std.logic ; 

149 

writeEnable.matrix 

out  std.logic ; 

150 

enable_vresult 

out  std.logic ; 

151 

read_address_vResult 

out  std.logic.vector 

(.  .  . 

address_width  -  1 

downt  o 

0)  ; 

152 

write_address_reducevResult 

out  std.logic.vector 

(.  .  . 

address_width  -  1 

downt  o 

0) 

153 

) ; 

154 

155 

end  component  vresultreducecontrol; 

156 

157 

158 

component  mem.matrix  is 
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generic  ( address_width :  integer  :=  8; 

data_width :  integer  :=  24); 


port  (elk  :  in  std_logic; 

reset  :  in  std_logic; 

writeEnable  :  in  std_logic; 

matr ix_values  :  in  std_logic_vector  (... 

data_width  -  1  downto  0); 
write_address  :  in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

read.address  :  in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

output_matr ix_values  :  out  std_logic_vector  (  . .  . 
data_width  -  1  downto  0) 

) ; 

end  component  mei.matrix; 


signal  enable_reduce_h_to_enable_control  ,  ... 

writeEnable_matrix_to_enable_matrixA :  std_logic ; 
signal  enable_vresult_to_enable_conv2_module  ,  ... 

enable_reduce_v_to_enable_control :  std_logic ; 
signal  wr iteEnable_matr ix_to_writeEnable :  std_logic; 
signal  write_address_reducevResult_to_write_address :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_address_hResult_to_read_address_matrixC :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_addressvResult_to_read_address_matrixC :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  write_address_reducehResult_to_read_address_matrixA  :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  output_data_matrixC_to_input_matrixA_values :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  output_data_matrixC_to_matrix_values :  std_logic_vector ( .  .  . 
data_width  -  1  downto  0) ; 

begin 

hr e suit conv2_module 1 :  hresultconv2_module  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.matrixA  =>  enable.matrixA  , 
enable_conv2_module  =>  enable_conv2_module  , 
read.addr ess_matr ixA  =>  read_address_matr ixA , 
read.addr ess_matr ixC  => 

read_address_hResult_to_read_address_matrixC  , 
input.matr ixA_values  =>  input_matrixA_values  , 
output_data_matrixC  => 

output_data_matrixC_to_input_matrixA_values  , 
enable.reduce  =>  enable_reduce_h_to_enable_control  , 
input_to_matrixC  =>  input_to_matrixC_hresult  , 
writeEnable.matrixC  =>  writeEnable_matrixC_hresult  , 
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address_matrixA  =>  address_matrixA_hresult  , 
address_matrixB  =>  address_matrixB_hresult , 
address_matrixC  =>  address_matrixC_hresult  , 
matr ixA_output  =>  matrixA_output_hresult  , 
matr ixB_output  =>  matrixB_output_liresult  , 
control_adder_signal  =>  control_adder_signal_h.result ) ; 

hresultreducecontroll :  hresultreducecontrol  port  map(clk  =>  elk, 
reset  =>  reset  , 

enable_control  =>  enable_reduce_h_to_enable_control  , 
writeEnable_matrix  =  >  writeEnable_matr ix_to_enable_matrixA  , 
enable_vresu.lt  =>  enable_vresult_to_enable_conv2_module  , 
read_address_hResult  => 

read_address_h.Result_to_read_address_matrixC  , 
wr ite_address_reducehResult  => 

write_address_reduceh.Result_to_read_address_matrixA)  ; 

vr e suit conv2_module 1 :  vresultconv2_module  port  map(clk  =>  elk, 
reset  =>  reset  , 

enable.matr ixA  =>  writeEnable_matr ix_to_enable_matr ixA  , 
enable_conv2_module  =>  enable_vresult_to_enable_conv2_module  , 
read.addr ess_matr ixA  => 

wr ite_address_reducehResult_to_read_address_matr ixA  , 
read.addr ess_matr ixC  => 

read_addressvResult_to_read_address_matrixC  , 
input_matr ixA_values  => 

output_data_matrixC_to_input_matrixA_values  , 

output_data_matrixC  =>  output_data_matrixC_to_matrix_values  , 
enable_reduce  =>  enable_reduce_v_to_enable_control  , 
input_to_matrixC  =>  input_to_matrixC_vresult  , 
writeEnable_matrixC  =>  writeEnable_matrixC_vresult  , 
address.matrixA  =>  address_matrixA_vresult , 
address_matrixB  =>  address_matrixB_vresult , 
address_matrixC  =>  address_matrixC_vresult  , 
matr ixA_output  =>  matrixA_output_vresult  , 
matr ixB_output  =>  matrixB_output_vresult  , 
control_adder_signal  =>  control_adder_signal_vresult ) ; 

vresultreducecontrol 1  : vresultreducecontrol  port  map(clk  =>  elk, 

reset  =>  reset  , 

enable_control  =>  enable_reduce_v_to_enable_control  , 
writeEnable_matrix  =>  wr iteEnable_matrix_to_writeEnable  , 
enable.vresult  =>  reduce_module_f inished  , 
read_address_vResult  => 

read_addressvResult_to_read_address_matrixC  , 
wr ite_address_reducevResult  => 

write_address_reducevResult_to_write_address )  ; 

mem.matrixReduce :  mem_matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

writeEnable  =>  wr iteEnable_matrix_to_wr iteEnable  , 
matr ix.values  =>  output_data_matr ixC_to_matr ix_values  , 
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write_address  =>  write_address_reducevResult_to_wr ite_address  , 
read.address  =>  read_address_matr ixReduce  , 
output.matr ix_values  =>  output_data_matrixC) ; 

end  structure_reduceconv2_module  ; 
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C.2  Reduce  Matrix  Function  Top-Level  2  Behavior  Multiplier 

Listing  C.2:  The  ReduceConv2Module2.vhd  VHDL  file. 

(appendix3/ReduceConv2Module2.vhd) 


--  Capt .  Jason  Shirley 

--  This  is  the  Reduce  Matrix  Function  Top-Level  Design  2  using  the 

--  Behavior  Multiplier.  These  Modules  are  structually  connected. 

library  ieee  ; 

use  ieee . std_logic_ 1 164  .  all  ; 

use  ieee . std_logic_unsigned  .  all  ; 

use  ieee . std_logic_ar ith . all ; 

use  ieee . numeric.std . all ; 

entity  reduceconv2_module2  is 

generic  ( address_width :  integer 

:=  8; 

data_width:  integer  := 

24)  ; 

port  (elk 

in  std_logi c ; 

reset 

in  std_logi c ; 

enable.matrixA 

in  std_logi c ; 

enable_conv2_module 

in  std_logi c ; 

read_address_matrixA 

in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

read_address_matrixReduce 

in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

input_matrixA_values 

in  std_logic_vector  (... 

data_width  -  1  downto  0) ; 

reduce_module_f ini  shed 

out  std_logic ; 

output_data_matrixReduce 

out  std_logic_vector  (  . .  . 

data_width  -  1  downto  0) 

)  : 

end  reduceconv2_module2  ; 

architecture  structure  _reduceconv2_ 

is 

_module2  of  reduceconv2_module2  ... 
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55 

56 

57 

58 

59 

60 

61 

62 
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64 

65 

66 
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component  hresultcontrol2  is 


generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 

n2 :  integer  :=  11); 

--  address_widtli  must  be  an  even  number 

port  (elk  :  in  std_logic; 


reset 

in 

std_logic ; 

enable.control 

in 

std_logic ; 

update_reg_enable 

out 

std_logic ; 

writeEnable_matrixC 

out 

std_logic ; 

reset.reg 

out 

std_logiC ; 

enable.reduce 

out 

std_logic ; 

read_address_matrixA 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

read_address_matrixB 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

write_address_matrixC 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

mux.enable 

out 

std_logic 

) ; 

end  component  hresultcontrol2 ; 


component  hresultreducecontrol2  is 


generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 
n2 :  integer  :=  11); 

--  address_width  must  be  an  even  number 

--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 
--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 


port  (elk 

reset 

enable.control 

writeEnable_matrix 

enable.vresult 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
out  std_logic ; 
out  std_logic ; 
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read_address_hResult  :  out  std_logic_vector  (... 

address_width  -  1  downto  0) ; 
wr ite_address_reducehResult  :  out  std_logic_vector  (... 

address_width  -  1  downto  0) ; 
mux_enable  :  out  std_logic 

) ; 

end  component  hresultreducecontrol2 ; 


component  vresultcontrol2  is 

generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  9 ; 

n2 :  integer  : =  3) ; 

--  address_widtli  must  be  an  even  number 

port  (elk  :  in  std_logic; 

reset  :  in  std_logic; 

enable.control  :  in  std_logic; 

std_logic ; 
std_logic ; 
std_logiC ; 
std_logic ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (... 

0) 


end  component  vresultcontrol2 ; 


update_reg_enable  :  out 
wr iteEnable_matr ixC  :  out 
reset.reg  :  out 
enable.reduce  :  out 
read_address_matrixA  :  out 


address_width  -  1  downto 
read_address_matr ixB  :  out 
address_width  -  1  downto 
wr ite_address_matrixC  :  out 
address_width  -  1  downto 


component  vresultreducecontrol2  is 


generic  ( address_width :  integer  : 
nl :  integer  :  =  9 ; 
n2 :  integer  : =  3) ; 

--  address_width  must  be  an  even  number 
port  (elk  :  in 

reset  :  in 

enable.control  :  in 

wr iteEnable_matr ix  :  out 

enable.vresult  :  out 

read_address_vResult  :  out 


address_width  -  1  downto  0)  ; 
wr ite_address_reducevResult  :  out 
address_width  -  1  downto  0) 


=  8; 


std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 
std_logic_vector 

std_logic_vector 


) ; 


(. . . 
(. . . 


end  component  vresultreducecontrol2 ; 
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component  mux2tol  is 

generic  ( address_width :  integer  :=  8); 
port ( sel  :  in  std_logic; 

downt  o 
downt o 
1  downt  o  .  . . 

0) 

) ; 

end  component  mux2tol; 


in  std_logic_vector ( address_width  -  1 
in  std_logic_vector ( address_width  -  1 


input  1 
0)  ; 
input  2 
0)  ; 

mux.output  :  out  std_logic_vector ( address_width  - 


component  muxlto2data_width  is 


generic  (data_width 

:  integer  : =  24)  ; 

port (sel  : 

in  std_logic  ; 

input  : 

in  std_logic_vector ( data_width  - 

1  downto  0) 

) 

mux.outputl  : 

out  std_logic_vector ( data_width 

-  1  downto  . 

0)  ; 

mux_output2  : 

out  std_logic_vector ( data_width 

-  1  downto  . 

0) 

)  ; 


end  component  muxlto2data_width ; 


component  mux2to ldata_width  is 


generic  ( data_width : 

integer  : =  24) ; 

port (sel 

in 

std_logic ; 

input  1 

in 

std_logic_vector ( data_width  - 

1  downto  0)  ; 

input  2 

in 

std_logic_vector ( data_width  - 

1  downto  0)  ; 

mux_output 
) ; 

out 

std_logic_vector (data_width 

-  1  downto  0) 

end  component ; 

component  mux6to3enable 

is 

port (sel 

in 

std_logic ; 

input  1 

in 

std_logic ; 

input  2 

in 

std_logic ; 

input3 

in 

std_logic ; 

input4 

in 

std_logic ; 
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199 
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input5  : 

in  std_logic ; 

input6  : 

in  std_logic ; 

mux.output 1 

:  out 

std_logi c 

mux_output2 

:  out 

std_logi c 

mux_output3 
) ; 

:  out 

std_logic 

end  component ; 


component  mem_matrix  is 


generic 


( address_width  :  integer  :=  8; 
data_width:  integer  :=  24); 


port  (elk 

reset 

writeEnable 
matrix_values 
1  downto  0)  ; 
write_address 

-  1  downto  0)  ; 
read_address 

-  1  downto  0)  ; 
output  _m at rix.values 

1  downto  0) 

)  ; 


in 

std_ 

logic ; 

in 

std_ 

logic ; 

in 

std_ 

logic ; 

in 

std_ 

logic.vector 

in 

std_ 

logic.vector 

in 

std_ 

logic.vector 

out 

std 

.logic.vector 

(data.width  -  . .  . 
(address_width. .  . 
(address_width. .  . 
( data_width  - 


end  component  mei.matrix; 


component  masklx5mem_matr ix  is 


generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port (elk  :  in  std_logic; 

reset  :  in  std_logic; 

read.address  :  in  std_logic_vector 
downto  0)  ; 

output_matrix_values  :  out  std_logic 
1  downto  0) 

) ; 


( address_width  -  1 
vector (data.width  - 


end  component  masklx5mem_matr ix ; 


component  mask5xlmem_matr ix  is 

generic  ( address_width :  integer  :=  8; 

data_widtli:  integer  :=  24); 
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port(clk  :  in  std_logic; 

reset  :  in  std_logic; 

read.address  :  in  std_logic_vector  ( address_width  -  1 
downto  0)  ; 

output _mat r ix.values :  out  std_logic_vector ( data_width  - 
1  downto  0) 

) ; 

end  component  mask5xlmem_matrix ; 


component  conv2  is 

generic  ( data_width  :  integer 

port  (elk  : 

reset  : 

update_reg_enable  : 

input_matrixA_values  : 

1  downto  0)  ; 
input_matrixB_values  : 

1  downto  0)  ; 
output_data_matrixC  : 
1  downto  0) 

)  ; 

end  component  conv2 ; 


:=  24)  ; 


in 

std_logic ; 

in 

std_logic ; 

in 

std_logic ; 

in 

std_logic_vector 

(  dat  a 

_width 

in 

std_logic_vector 

(  dat  a 

_width 

out 

std_logic_vector 

( dat 

a_width 

signal  update_reg_enable_to_mux_input 1  ,  ... 

control.wr iteEnable_memC_to_Wr iteEnableMatrixC_mux_input3  :  . . 

std_logi c ; 

signal  control_reset_reg_to_hvcontrol_reset_mux_input2 ,  ... 

enable _reduce_to_hResultReduceControl :  std_logic; 
signal  mux_enable_to_mux_select  ,  writeEnable_matrix_to_MatrixA2  , 
enable_vresult_to_vResultControl  :  std_logic; 
signal  mux_enable_to_mux_select_matrixA2  ,  ... 

update_reg_enable_to_mux_input4 :  std_logic ; 
signal  control.wr iteEnable_memC_to_mux_input6 :  std_logic; 
signal  control_reset_reg_to_mux_input5  ,  ... 

enable_reduce_to_vResultReduceControl_enable  :  std_logic; 
signal  writeEnable_matrix_to_MatrixReduce  ,  ... 

mux_output_to_conv2_update_reg :  std_logic; 
signal  mux_output_to_conv2_reset  ,  ... 

mux_out_writeEnableMatrixC_to_wr iteEnableMatr ixC  :  std_logic ; 
signal  read_address_matrixA_to_mux_inputl  ,  ... 

read_address_matr ixB_to_masklx5  :  std_logic_vector  (... 
address_width  -  1  downto  0) ; 

signal  write_address_matrixC_to_WriteAddressMatrix_mux_input 1  ,  . 

write_addres  s  _r educe hRe  suit  _t  o_mux_ input l_matrixA2_ address  : 
std_logic_vector  ( address_width  -  1  downto  0); 
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signal  read_address_hResult_to_mux_input l_readAddressMatrixC  ,  ... 

wr ite_address_matrixC_to_Wr iteAddressMatr ix_mux_input2  :  ... 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  wr it e_address_r educe vResult_to_mux_ input 2_matr ixA2_addr ess  , 
read_address_matr ixB_to_mask5xl  :  std_logic_vector  (... 
address_width  -  1  downto  0) ; 

signal  read_address_vResult_to_mux_input2_readAddressMatrixC  ,  ... 

wr ite_address_reducevResult_to_Matr ixReduce :  std_logic_vector  ( 
address_width  -  1  downto  0); 

signal  mux_ouput_to_matrixAl  , mux_output_to_matrixA2_address :  ... 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_address_matr ixC_to_address_matrixC  ,  ... 

mux_output_Wr iteAddressMatr ixC_to_write_address_matrixC  :  ... 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  out put _mat r ix_ value s _t o_mux_ input  1 _matr ixA_value s  ,  ... 

output_matrix_values_to_mux_input2_matrixA_values  :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  mux.out put _matr ixA_value s_t o_ conv2_ input  1  ,  ... 

output .matrix. value s _t o_mux_ input  1 _matrixB_ value s  :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  output_matrix_values_to_mux_input2_matrixB_values  ,  ... 

mux _ out put  _mat r ixB_t  o_  conv2_ input2 :  std_logic_vector (data_width 
-  1  downto  0)  ; 

signal  output_matrixC_values_to_mux_input_values  ,  ... 

mux_output_MatrixC_values_to_matrixA2_values :  std_logic_vector ( 
data_width  -  1  downto  0) ; 

signal  mux_output_Matr ixC_values_to_matr ixReduce_values :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  output_data_matrixC_to_matrixC_values :  std_logic_vector ( .  .  . 
data_width  -  1  downto  0) ; 

begin 

hresultcontroll  :  tiresultcontrol2  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable_conv2_module , 

update_reg_enable  =>  updat e_reg_enable_to _mux_ input  1  , 
writeEnable_matrixC  => 

control_writeEnable_memC_to_WriteEnableMatrixC_mux_input3  , 
reset.reg  =>  control_reset_reg_to_hvcontrol_reset_mux_input2  , 
enable.reduce  =>  enable_reduce_to_hResultReduceControl  , 
read.addr ess.matr ixA  =>  read.addr e s s_matr ixA_t o_mux_ input  1  , 
read.addr ess.matr ixB  =>  read_address_matrixB_to_masklx5  , 
wr ite_address_matr ixC  => 

write_address_matrixC_t o_WriteAddressMatrix_mux_ input  1  , 
mux.enable  =>  mux_enable_to_mux_select ) ; 


hresultreducecontrol 1 :  iresultreducecontrol2  port  map(clk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable_reduce_to_h.ResultReduceControl  , 
writeEnable_matrix  =>  writeEnable_matrix_to_MatrixA2  , 
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enable_vresu.lt  =>  enable_vresult_to_vResultControl  , 
read_address_hResult  => 

read_address_hResult_to_mux_ input l.readAddressMatrixC  , 
wr ite_address_reducehResult  => 

wr it e_address_reducehResult_to_mux_ input l_matrixA2_address  , 
mux.enable  =>  mux_enable_ t o_mux_ sele ct _mat r ixA2 )  ; 


vresultcontrol 1 :  vresultcontrol2  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable_vresult_to_vResultControl  , 
update_reg_enable  =>  update_reg_enable_to_mux_input4  , 
writeEnable_matrixC  =>  control_writeEnable_memC_to_mux_input6  , 
reset.reg  =>  control_reset_reg_to_mux_input5  , 

enable.reduce  =>  enable_reduce_to_vResultReduceControl_enable  , 
read_address_matrixA  => 

write_address_reducevResult_to_mux_input2_matrixA2_address  , 
read_address_matrixB  =>  read_address_matrixB_to_mask5xl  , 
wr ite_address_matr ixC  => 

wr it e_address_matr ixC_to_Wr it e Address Matrix _mux_input2 )  ; 


vresultreducecontrol 1  : vresultreducecontrol2  port  map(clk  =>  elk, 

reset  =>  reset  , 

enable.control  =>  enable_reduce_to_vResultReduceControl_enable  , 
writeEnable_matrix  =>  wr iteEnable_matr ix_to_Matr ixReduce  , 
enable.vresult  =>  reduce_module_f inished  , 
read_address_vResult  => 

r ead_addr ess_vRe suit _to_mux_ input 2_readAddressMatrixC  , 
wr ite_address_reducevResult  => 

write_address_reducevResult_to_MatrixReduce)  ; 


mux2tol_matrixAl :  mux2tol  port  map(sel  =>  enable_matr ixA , 
inputl  =>  read_addr ess.matr ixA_t o_mux_ input  1  , 
input2  =>  read_address_matr ixA  , 
mux.output  =>  mux_ouput_to_matrixAl )  ; 


mux2tol_matrixA_values :  mux2toldata_width  port  map ( 
sel  =>  mux_enable_to_mux_select  , 

inputl  =>  output_matrix_values_to_mux_input l_matrixA_values  , 
input2  =>  output_matrix_values_to_mux_input2_matrixA_values  , 
mux.output  =>  mux_output _mat rixA_ value s_t o_ conv2 _ input  1 )  ; 


mux2tol_matrixB_values :  mux2toldata_width  port  map ( 
sel  =>  mux_enable_to_mux_select  , 

inputl  =>  output_matrix_values_to_mux_input l_matrixB_values  , 
input2  =>  output_matrix_values_to_mux_input2_matrixB_values  , 
mux.output  =>  mux_output_matrixB_to_conv2_input2 ) ; 
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mux6to3_Enablel :  mux6to3enable  port  map  ( 

sel  =>  mux_enable_to_mux_select  , 

inputl  =>  update_reg_enable_to_mux_input 1  , 

input2  =>  control_reset_reg_to_hvcontrol_reset_mux_input2  , 
input3  => 

control_writeEnable_memC_to_WriteEnableMatrixC_mux_input3  , 

input4  =>  update_reg_enable_to_mux_input4  , 

input5  =>  control_reset_reg_to_mux_input5  , 

input6  =>  control_writeEnable_memC_to_mux_input6  , 

mux.outputl  =>  mux_output_to_conv2_update_reg  , 

mux_output2  =>  mux_output_to_conv2_reset  , 

mux_output3  =>  mux.out  _wr  it  eEnableMat r  ixC_t  o_wr  i t  eEnableMat r  ixC  )  ; 


mux2tol_ReadAddressMatrixC :  mux2tol  port  map ( 
sel  =>  mux_enable_to_mux_select_matrixA2  , 

inputl  =>  read_address_h.Result_to_mux_input l.readAddressMatr ixC  , 
input2  =>  read_address_vResult_to_mux_input2_readAddressMatr ixC  , 
mux_output  =>  read_address_matr ixC_to_address_matrixC )  ; 


mux2t o 1 _Wr it e Addr e s sMat r ixC :  mux2tol  port  map ( 
sel  =>  mux_enable_to_mux_select  , 

inputl  =>  wr ite_address_matr ixC_to_Wr iteAddressMatr ix_mux_input 1  , 
input2  =>  wr ite_address_matr ixC_to_Wr iteAddressMatr ix_mux_input2  , 
mux.output  => 

mux_output_WriteAddressMatrixC_to_write_address_matrixC)  ; 


muxlto2_matrixA2_ReduceMatrix_values :  muxlto2data_width  port  map ( 

sel  =>  mux_enable_to_mux_select_matr ixA2  , 

input  =>  output_matrixC_values_to_mux_input_values  , 

mux.outputl  =>  mux_output_MatrixC_values_to_matrixA2_values  , 

mux_output2  =>  mux_out put _Matr ixC_ value s _t o_matr ixReduce _ value s ) ; 


mem_matrixAl :  mem.matrix  port  map  (elk  =>  elk,  reset  =>  reset, 

writeEnable  =>  enable.matr ixA , 

matr ix.values  =>  input_matrixA_values  , 

write_address  =>  mux_ouput_to_matrixAl , 

read.address  =>  mux_ouput_to_matrixAl  , 

output.matr ix_values  => 

output .matrix. value s_t o_mux_ input l_matrixA_values)  ; 


mem_matrixA2 :  mem.matrix  port  map  (elk  =>  elk,  reset  =>  reset, 
writeEnable  =>  wr iteEnable.matr ix_to_Matr ixA2  , 

matr ix.values  =>  mux_output_Matr ixC_values_to_matr ixA2_values  , 
write_address  => 

wr it e_address_reducehResult_to_mux_ input l_matrixA2_address  , 
read.address  => 
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write_address_reducevResult_to_mux_input2_matrixA2_address  , 
output.matr ix_values  => 

output .matrix. value s_t o_mux_ input2_matrixA_values)  ; 


masklx5mem_matrix0 :  masklx5mem_matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

read.address  =>  read_address_matr ixB_to_masklx5  , 
output.matr ix.values  => 

output .matrix. value s_t o_mux_ input l_matrixB_values)  ; 


mask5xlmem_matrix0 :  mask5xlmem_matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

read.address  =>  read_address_matr ixB_to_mask5xl  , 
output.matr ix_values  => 

output .matrix. values _to_mux_ input 2_matr ixB_ values )  ; 


conv2_l:  conv2  port  map  (elk  =>  elk, 
reset  =>  mux_output_to_conv2_reset  , 

update_reg_enable  =>  mux_output_to_conv2_update_reg , 

input.matr ixA_values  =>  mux_output _mat rixA_values_t o_c onv2_ input  1  , 
input.matr ixB_values  =>  mux_output_matrixB_to_conv2_input2  , 
output_data_matrixC  =>  output_data_matr ixC_to_matrixC_values ) ; 


mem_matr ixC :  mem.matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

writeEnable  =>  mux_out_wr iteEnableMatr ixC_to_writeEnableMatr ixC  , 
matr ix.values  =>  output_data_matr ixC_to_matrixC_values  , 
write_address  => 

mux_output_WriteAddressMatrixC_to_write_address_matrixC  , 
read.address  =>  read_address_matr ixC_to_address_matr ixC  , 
output.matr ix_values  => 

output.matr ixC_ value s_to_mux_ input _ values )  ; 


mem.matrixReduce :  mem.matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

writeEnable  =>  wr iteEnable_matr ix_to_Matr ixReduce  , 

matr ix.values  =>  mux_output_Matr ixC_values_to_matr ixReduce_values  , 
write_address  =>  write_address_reducevResult_to_Matr ixReduce  , 
read.address  =>  read_address_matr ixReduce , 
output.matr ix_values  =>  output_data_matr ixReduce ) ; 

end  structure_reduceconv2_module2  ; 
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C.3  Reduce  Matrix  Function  Top-Level  1  Booth  Multiplier 

Listing  C.3:  The  ReduceConv2ModuleBooth.vhd  VHDL  file. 

(appendix3/ReduceConv2ModuleBooth.vhd) 

--  Capt .  Jason  Shirley 

--  This  is  the  Reduce  Matrix  Function  Top  Level  Design  1  using  the 
--  Booth  Multiplier.  These  Modules  are  structually  connected. 


library  ieee  ; 

use  ieee . st d_logi c_ 1 1 64  .  all  ; 
use  ieee . std_logi c_uns igned  .  all  ; 
use  ieee . std_logic_ar ith . all ; 
use  ieee . numeric_std . all ; 


entity  reduceconv2_modulebooth  is 


generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 

port  (elk  :  in 

reset  :  in 

enable.matr ixA  :  in 

enable_conv2_module  :  in 

read_address_matrixA  :  in 

address_width  -  1  downto  0) ; 
read_address_matr ixReduce  :  in 

address_width  -  1  downto  0) ; 
input_matrixA_values  :  in 

data_width  -  1  downto  0) ; 
output_data_matrixC  :  out 

data_width  -  1  downto  0) ; 
reduc e.module _f ini  shed  :  out 

input_to_matrixC_hresult  :  out 

data_width  -  1  downto  0) ; 
wr iteEnable_matr ixC_hresult  :  out 

address_matrixA_hresult  :  out 

address_width  -  1  downto  0) ; 
address_matr ixB_hresult  :  out 

address_width  -  1  downto  0) ; 
address_matrixC_hresult  :  out 

address_width  -  1  downto  0) ; 
matrixA_output_hresult  :  out 

data_width  -  1  downto  0) ; 
matr ixB_output_hresult  :  out 

data_width  -  1  downto  0) ; 
control_adder_signal_hresult :  out 


input _to_matrixC_vre suit 

data_width  -  1  downto  0) ; 
wr iteEnable_matr ixC_vresult  :  out 


std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 

std_logic_vector  (... 
std_logic_vector  (... 
std_logic_vector  (... 
std_logic_vector ( . .  . 
std_logic ; 

std_logic_vector ( . .  . 
std_logic ; 

std_logic_vector ( .  .  . 
std_logic_vector ( .  .  . 
std_logic_vector (  .  .  . 
std_logic_vector (  .  .  . 
std_logic_vector (  .  .  . 
std_logic ; 

:  out  std_logic_vector ( . .  . 
std_logic ; 
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address_matrixA_vresult  :  out 

address_width  -  1  downto  0) ; 
address_matrixB_vresult  :  out 

address_width  -  1  downto  0) ; 
address_matrixC_vresult  :  out 

address_width  -  1  downto  0) ; 
matrixA_output_vresult  :  out 

data_width  -  1  downto  0) ; 
matrixB_output_vresult  :  out 

data_width  -  1  downto  0) ; 
control_adder_signal_vresult :  out 
mult_ready_hresult  :  out 

mult_done_hresult  :  out 

mult_ready_vresult  :  out 

mult_done_vresult  :  out 

) ; 


std_logic_vector ( . .  . 

std_logic_vector  (  . .  . 

std_logic_vector  (  . .  . 

std_logic_vector  (  . .  . 

std_logic_vector ( . .  . 

std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 
std_logic 


end  reduceconv2_modulebooth ; 


architecture  structure_reduceconv2_modulebooth  of 
reduceconv2_modulebooth  is 

component  hresultboothconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


(elk 

in 

std_logic ; 

reset 

in 

std_logic ; 

enable.matrixA 

in 

std_logic ; 

enable_conv2_module 

in 

std_logic ; 

read_address_matrixA 

in 

std_logic_vector  (... 

address_width  -  1 

downt  o 

0)  ; 

read_address_matrixC 

in 

std_logic_vector  (... 

address_width  -  1 

downt  o 

0)  ; 

input_matrixA_values 

in 

std_logic_vector  (data_width 

1  downto  0)  ; 

output_data_matrixC 

out 

std_logic_vector (data_width 

1  downto  0)  ; 

enable.reduce 

out 

std_logic ; 

input_to_matrixC 

out 

std_logic_vector (data_width 

1  downto  0)  ; 

writeEnable_matrixC 

out 

std_logic ; 

address.matrixA 

out 

std_logic_vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

address.matrixB 

out 

std_logic_vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 

address.matrixC 

out 

std_logic_vector  (  . .  . 

address_width  -  1 

downt  o 

0)  ; 
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96 
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99 

100 
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matrixA_output 

:  out 

std_logic_vector (data_width 

1  downto  0)  ; 

matrixB.output 

:  out 

std_logic_vector (data_width 

1  downto  0)  ; 

control_adder_signal 

:  out 

std_logic ; 

mult.ready 

:  out 

std_logic ; 

mult_done 

:  out 

std_logic 

) ; 


end  component  hresultboothconv2_module ; 


component  hresultreducecontrolbooth  is 

generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 
n2 :  integer  :=  11); 

--  address_width  must  be  an  even  number 

--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 
--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 


port 


(elk  : 

reset  : 

enable.control  : 

wr iteEnable_matr ix  : 

enable_vresult  : 

read_address_hResult  : 

address_width  -  1  downto 
wr ite_address_reducehResult : 
address_width  -  1  downto 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic_vector 
0)  ; 

out  std_logic_vector 
0) 


)  ; 


(.  .  . 
(.  .  . 


end  component  hresultreducecontrolbooth; 


component  vresultboothconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 
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port 


(elk  :  in 

reset  :  in 

enable.matrixA  :  in 

enable_conv2_module  :  in 

read_address_matrixA  :  in 

address_width  -  1  downto 
read_address_matrixC  :  in 
address_width  -  1  downto 
input_matrixA_values  :  in 
1  downto  0)  ; 

output_data_matrixC  :  out 
1  downto  0)  ; 

enable_reduce  :  out 

input_to_matrixC  :  out 

1  downto  0) ; 

wr iteEnable.matr ixC  :  out 
address.matr ixA  :  out 

address_width  -  1  downto 
address.matr ixB  :  out 

address_width  -  1  downto 
address.matr ixC  :  out 

address_width  -  1  downto 
matrixA.output  :  out 

1  downto  0)  ; 

matrixB_output  :  out 

1  downto  0)  ; 

control_adder_signal  :  out 
mult.ready  :  out 

mult.done  :  out 

) ; 


std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (data_width 
std_logic_vector (data_width 
std_logic ; 

std_logic_vector (data_width 
std_logic ; 

std_logic_vector  (  . .  . 

0)  ; 

std_logic_vector  (  . .  . 

0)  ; 

std_logic_vector  (  . .  . 

0)  ; 

std_logic_vector (data_width 

std_logic_vector (data_width 

std_logic ; 
std_logic ; 
std_logic 


end  component  vresultboothconv2_module ; 


component  vresultreducecontrolbooth  is 


generic  ( address_width :  integer  : 
nl :  integer  : =  9 ; 
n2 :  integer  : =  3) ; 

--  address_widtli  must  be  an  even  number 
port  (elk  :  in 

reset  :  in 

enable.control  :  in 

wr iteEnable_matr ix  :  out 

enable_vresult  :  out 

read_address_vResult  :  out 


address_width  -  1  downto  0)  ; 
wr ite_address_reducevResult  :  out 
address_width  -  1  downto  0) 


=  8; 


std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 
std_logic_vector 

std_logic_vector 


) ; 


(. . . 
(. . . 
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end  component  vresultreducecontrolbooth ; 


component  mem_matrix  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk  : 

reset  : 

writeEnable  : 

matrix_values 

-  1  downto  0)  ; 
wr ite.address 

address_width  -  1 
read.address 

address_width  -  1 
output_matrix_values 

-  1  downto  0) 

)  ; 

end  component  mem.matrix; 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 

:  in  std_logic_vector  (data_width 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  out  std_logic_vector ( data_width 


signal  enable_reduce_h_to_enable_control  ,  ... 

writeEnable_matrix_to_enable_matrixA  :  std_logic ; 
signal  enable_vresult_to_enable_conv2_module  ,  ... 

enable_reduce_v_to_enable_control :  std_logic ; 
signal  wr iteEnable_matr ix_to_writeEnable :  std_logic; 
signal  wr ite_address_reducevResult_to_write_address :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_address_hResult_to_read_address_matr ixC :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_addressvResult_to_read_address_matr ixC  :  . . . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  write_address_reduceh.Result_to_read_address_matrixA  :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  output_data_matrixC_to_input_matrixA_values :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  output_data_matrixC_to_matrix_values :  std_logic_vector ( .  .  . 
data_width  -  1  downto  0) ; 

begin 

hresultboothconv2_modulel :  hresultboothconv2_module  port  map  ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable.matr ixA  =>  enable.matr ixA , 
enable_conv2_module  =>  enable_conv2_module  , 
read_address_matrixA  =>  read_address_matr ixA , 
read_address_matrixC  => 

read_address_hResult_to_read_address_matrixC  , 
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input_matr ixA_values  =>  input_matrixA_values  , 
output_data_matrixC  => 

output_data_matrixC_to_input_matrixA_values  , 
enable_reduce  =>  enable.reduce.h.to.enable.control  , 
input_to_matrixC  =>  input_to_matrixC_hresult , 
writeEnable_matrixC  =>  writeEnable_matrixC_h.resu.lt, 
address.matrixA  =>  address_matrixA_hresult  , 
address_matrixB  =>  address_matrixB_hresult  , 
address_matrixC  =>  address_matrixC_hresult , 
matr ixA_output  =>  matrixA_output_hresult , 
matr ixB_output  =>  matrixB_output_hresult  , 
control_adder_signal  =>  control_adder_signal_hresult  , 
mult_ready  =>  mult_ready_hresult  , 
mult_done  =>  mult .done _hre suit ) ; 

hresultreducecontrolboothl :  hresultreducecontrolbooth  port  map ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable.control  =>  enable_reduce_h_to_enable_control  , 
writeEnable_matrix  =  >  wr iteEnable.matr ix_to_enable_matrixA  , 
enable.vresult  =>  enable_vresult_to_enable_conv2_module  , 
read.address.hResult  => 

read_address_hResult_to_read_address_matrixC  , 
wr ite.address.reducehResult  => 

write_address_reducehResult_to_read_address_matrixA)  ; 

vresultboothconv2_modulel :  vresultboothconv2_module  port  map ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable.matr ixA  =>  writeEnable.matr ix_to_enable_matr ixA  , 
enable_conv2_module  =>  enable_vresult_to_enable_conv2_module  , 
read.addr ess.matr ixA  => 

wr ite_address_reducehResult_to_read_address_matr ixA  , 
read.addr ess.matr ixC  => 

read.addressvResult.to.read.address.matrixC  , 
input.matr ixA.values  => 

output.data.matrixC.to.input.matrixA.values  , 

output.data.matrixC  =>  output.data.matrixC.to.matrix.values  , 
enable.reduce  =>  enable.reduce.v.to.enable.control  , 
input.to.matrixC  =>  input.to.matrixC.vresult  , 
writeEnable.matrixC  =>  writeEnable.matrixC.vresult  , 
address.matrixA  =>  address.matrixA.vresult , 
address.matrixB  =>  address.matrixB.vresult , 
address.matrixC  =>  address.matrixC.vresult  , 
matr ixA.output  =>  matrixA.output.vresult  , 
matr ixB.output  =>  matrixB.output.vresult  , 
control.adder.signal  =>  control.adder.signal.vresult  , 
mult.ready  =>  mult.ready.vresult  , 
mult.done  =>  mult.done.vresult ) ; 

vresultreducecontrolboothl  : vresultreducecontrolbooth  port  map ( 
elk  =>  elk  , 
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reset  =>  reset  , 

enable_control  =>  enable_reduce_v_to_enable_control  , 
writeEnable_matrix  =>  wr iteEnable_matrix_to_writeEnable  , 
enable_vresu.lt  =>  reduce_module_f inished  , 
read_address_vResult  => 

read_addressvResult_to_read_address_matrixC  , 
wr ite_address_reducevResult  => 

write_address_reducevResult_to_write_address )  ; 

mem.matrixReduce :  mem.matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

writeEnable  =>  wr iteEnable_matrix_to_wr iteEnable  , 
matr ix.values  =>  output_data_matr ixC_to_matrix_values  , 
write_address  =>  write_address_reducevResult_to_wr ite_address  , 
read.address  =>  read_address_matr ixReduce , 
output.matr ix_values  =>  output_data_matrixC) ; 

end  structure_reduceconv2_modulebootli ; 
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Listing  C.4:  The  ReduceConv2ModuleBooth2.vhd  VHDL  file. 

(appendix3/ReduceConv2ModuleBooth2.vhd) 


--  Capt .  Jason  Shirley 

--  This  is  the  Reduce  Matrix  Function  Top-Level  Design  2  using  the 
--  Booth  Multiplier.  These  Modules  are  structually  connected. 


library  ieee  ; 

use  ieee . std_logic_ 1 164  .  all  ; 
use  ieee . std_logi c_uns igned  .  all  ; 
use  ieee . std_logic_ar ith . all ; 
use  ieee . numeric.std . all ; 


entity  reduceconv2_modulebooth2  is 


generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port 


(elk  :  in 

reset  :  in 

enable.matrixA  :  in 

enable_conv2_module  :  in 

read_address_matrixA  :  in 


address_width  -  1  downto  0) 
read_address_matr ixReduce  :  in 
address_width  -  1  downto  0) 


std_logi c ; 
std_logi c ; 
std_logic ; 
std_logi c ; 
std_logic_vector 

i 

std_logic_vector 


(.  .  . 
(.  .  . 
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input_matrixA_valu.es 

data_width  -  1  downto 
reduce_module_f ini  shed 
output_data_matrixReduce 
data_width  -  1  downto 
mult _ready_hre suit 
mult  _done_hresult 
mult _ready_vre suit 
mult  _done_vresult 
) ; 


:  in  std_logic_vector  (... 
0)  ; 

:  out  std_logic ; 

:  out  std_logic_vector  (  . .  . 
0)  ; 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic 


end  reduceconv2_modulebooth2 ; 


architecture  structure_reduceconv2_modulebooth2  of  ... 
reduceconv2_modulebooth2  is 

component  hresultcontrolbooth2  is 

generic  ( address_width :  integer  :=  8; 

nl :  integer  : =  5 ; 
n2 :  integer  : =  11); 

--  address_width  must  be  an  even  number 

--  nl  and  n2  are  the  size  of  the  input  matrix  conv2  with  the 
--  1x5  mask  example,  input  6x8  matrix,  mask  1x5,  new  conv2  matrix 
--  equals  Conv2  (nlxn2)=  (6+l-l)x  (8  +  5-1)  ,  therefore  nl  =  6, 

--  n2  =  12,  but  the  matrix  starts  at  zero  so,  nl  =  (6-1)  =  5, 

--  n2  =  (12-1)  =  11. 


(elk 

in 

std_logic ; 

reset 

in 

std_logic ; 

enable_control 

in 

std_logic ; 

update_reg_enable 

out 

std_logic ; 

writeEnable.matrixC 

out 

std_logic ; 

reset_reg 

out 

std_logic  ; 

enable_reduce 

out 

std_logic ; 

execute 

out 

std_logic ; 

read_address_matrixA 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

read_address_matrixB 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

write_address_matrixC 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

mux.enable 

out 

std_logic 

)  ; 

end  component  hresultcontrolbooth2 ; 


component  hresultreducecontrolbooth2  is 
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generic  ( address_width :  integer  :=  8;  --  address_width 

must  be  an  even  number 
nl :  integer  : =  5 ; 
n2 :  integer  :=  11); 

--  address_width  must  be  an  even  number 

--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 
--  The  hResultReduceControl  module  reduces  the  orignal  6x8 
--  input  matrix  in  half  from  the  number  of  columns  from  the 
--  hResultControl  module.  Example  6  x  12  will  be  reduce  down 
--  to  6x4  nl  is  the  row  position  and  n2  is  the  column  position. 
--  you  want  nl  and  n2  to  be  one  size  smaller  then  the  6  x  12 
--  therefore  nl  should  equal  =  5  since  you  start  at  zero 
--  and  n2  should  equal  =  11  since  you  start  at  zero 


port 


(elk  : 

reset  : 

enable_control  : 

wr iteEnable_matr ix  : 

enable.vresult  : 

read_address_hResult  : 

address_width  -  1  downto 
wr ite_address_reducehResult : 

address_width  -  1  downto 
mux.enable  : 

) ; 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic_vector 
0)  ; 

out  std_logic_vector 
0)  ; 

out  std_logic 


(.  .  . 
(.  .  . 


end  component  hresultreducecontrolbooth2 ; 


component  vresultcontrolbooth2  is 


generic  ( address_width :  integer  :=  8; 

nl :  integer  : =  9 ; 
n2 :  integer  : =  3) ; 

--  address_width  must  be  an  even  number 

--  nl  and  n2  are  the  size  of  the  input  matrix  conv2  with  the 
--  5x1  mask  example,  input  6x4  matrix,  mask  5x1,  new  conv2 
--  matrix  equals  Conv2  (nlxn2)=  (6+5-l)x  (4+1-1), 

--  therefore  nl  =  10,  n2  =  4,  but  the  matrix  starts  at 
--  zero  so,  nl  =  (10-1)  =  9 ,  n2  =  (4-1)  =  3. 


port  (elk 

reset 

enable.control 

update_reg_enable 

writeEnable_matrixC 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
out  std_logic ; 
out  std_logic ; 
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reset.r eg 

:  out 

std_logic  ; 

enable.reduce 

:  out 

std_logic  ; 

execute 

:  out 

std_logic  ; 

read_address_matrixA 

:  out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

read_address_matrixB 

:  out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0)  ; 

write_address_matrixC 

:  out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0) 

) ; 

end  component  vresultcontrolbooth2 ; 


component  vresultreducecontrolbooth2  is 

generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  9 ; 
n2 :  integer  : =  3) ; 

--  address_width  must  be  an  even  number 


(elk 

in 

std_logic ; 

reset 

in 

std_logic ; 

enable_control 

in 

std_logic ; 

writeEnable.matrix 

out 

std_logic ; 

enable_vresult 

out 

std_logic ; 

read_address_vResult 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

o)  ; 

write_address_reducevResult 

out 

std_logic_vector 

( 

address_width  -  1 

downt  o 

0) 

) ; 

end  component  vresultreducecontrolbooth2 ; 


component  mux2tol  is 

generic  ( address_width :  integer  :=  8); 


port ( sel 

input  1 
0)  ; 
input2 
0)  ; 

mux.output 

0) 

) ; 


in  std_logic ; 

in  std_logic_vector ( address_width  -  1  downto 
in  std_logic_vector ( address_width  -  1  downto 
out  std_logic_vector ( address_width  -  1  downto 


end  component  mux2tol; 


component  muxlto2data_width  is 
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generic  (data.width 

:  integer  := 

port ( sel  : 

in 

std_logic 

input  : 

in 

std_logic 

f 

mux_outputl  : 
0)  ; 

out 

std_logi 

mux_output2  : 
0) 

) ; 

out 

std_logi 

24)  ; 

i 

.vector ( data_width  -  1  downto  0) 
c.vector (data.width  -  1  downto  . 
c.vector (data.width  -  1  downto  . 


end  component  muxlto2data_width ; 


component  mux2to ldata.width  is 

generic  (data.width: 

integer  : =  24) ; 

port (sel 

in 

std_logic ; 

input  1 

in 

std.logic.vector ( data.width  - 

1  downto  0)  ; 

input2 

in 

std.logic.vector ( data.width  - 

1  downto  0)  ; 

mux.output 
) ; 

out 

std.logic.vector (data.width 

-  1  downto  0) 

end  component ; 

component  mux8to4enable 

is 

port (sel 

in 

std.logic ; 

input  1 

in 

std.logic ; 

input2 

in 

std.logic ; 

input3 

in 

std.logic ; 

input4 

in 

std.logic ; 

input5 

in 

std.logic ; 

input6 

in 

std.logic ; 

input7 

in 

std.logic ; 

input8 

in 

std.logic ; 

mux.output 1 

:  out  std.logi c ; 

mux_output2 

:  out  std.logic ; 

mux_output3 

:  out  std.logi c ; 

mux_output4 
) ; 

:  out  std.logic 

end  component ; 

component  mux2to4enable  is 

port (sel  : 

in  std.logic 

inputl  : 

in  std.logic 

input2  : 

in  std.logic 

mux.output 1 

:  out  std.log 

mux_output2 

:  out  std.log 

95 


212 

213 

214 

215 

216 

217 

218 

219 

220 

221 

222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

240 

241 

242 

243 

244 

245 

246 

247 

248 

249 

250 

251 

252 

253 

254 

255 

256 


mux_output3  :  out  std_logic; 
mux_output4  :  out  std_logic 
) ; 


end  component ; 


component  mem_matrix  is 


generic 


( address_width  :  integer  :=  8; 
data_width:  integer  :=  24); 


port  (elk 

reset 

writeEnable 
matrix_values 
1  downto  0)  ; 
wr ite_address 

-  1  downto  0)  ; 
read.address 

-  1  downto  0)  ; 
output _m at rix_values 

1  downto  0) 

)  ; 


in 

std_ 

logic ; 

in 

std_ 

logic ; 

in 

std_ 

logic ; 

in 

std_ 

logic_vector 

in 

std_ 

logic_vector 

in 

std_ 

logic_vector 

out 

std 

_logic_vector 

(data_width  -  . .  . 
(address_width. .  . 
(address_width. .  . 
( data_width  - 


end  component  mem.matrix; 


component  masklx5mem_matr ix  is 


generic  ( address_width :  integer  :=  8; 

data_width 

:  integer  : =  24)  ; 

port (elk 

:  in  std_logic  ; 

reset 

:  in  std_logic  ; 

r ead.address 

:  in  std.logic.vector 

( address_width  -  1 

downto  0) 

i 

output.matr ix.values  :  out  std_logic 

.vector (data_width 

1  downto 

0))  ; 

end  component  masklx5mem_matr ix ; 


component  mask5xlmem_matrix  is 


generic 


( address_width  :  integer  :=  8; 
data_width:  integer  :=  24); 


port (elk  : 

reset  : 

read_address 
downto  0)  ; 


in  std_logic  ; 
in  std_logic ; 

:  in  std_logic_vector  ( address_width 
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output_matrix_values :  out  st d_logi c_ ve ct or ( dat a_width  - 
1  downto  0) 

) ; 

end  component  mask5xlmem_matrix ; 


component  boothconv2  is 


generic  ( data_width :  integer 


24)  ; 


port  (elk 

reset 
exe  cut  e 

update_reg_ enable 
input _mat rix A _ values 

-  1  downto  0) ; 
input_matr ixB_ values 

-  1  downto  0)  ; 

output_data_matrixC  :  out  std_logic_vector 

data_width  -  1  downto  0) ; 
mult_ready  :  out  std_logic; 

mult_done  :  out  std_logic 

)  ; 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic_vector 

:  in  std_logic_vector 


( dat  a_width 
( dat  a_width 
(.  .  . 


end  component  boothconv2 ; 


signal  updat e_reg_enable_t o _mux_ input  1  ,  ... 

control_writeEnable_memC_to_WriteEnableMatrixC_mux_input3 :  . . 

std_logi c ; 

signal  control_reset_reg_to_hvcontrol_reset_mux_input2  ,  ... 

enable _reduce_to_hResultReduceControl :  std_logic; 
signal  mux_enable_to_mux_select ,  writeEnable_matrix_to_MatrixA2 , 
enable_vresult_to_vResultControl  :  std_logic; 
signal  mux_enable_to_mux_select_matrixA2  ,  ... 

update_reg_enable_to_mux_input5 :  std_logic ; 
signal  control.wr iteEnable_memC_to_mux_input7 :  std_logic; 
signal  control_reset_reg_to_mux_input6  ,  ... 

enable_reduce_to_vResultReduceControl_enable  :  std_logic; 
signal  writeEnable_matrix_to_MatrixReduce  ,  ... 

mux_output_to_conv2_update_reg :  std_logic ; 
signal  mux_output_to_conv2_reset  ,  ... 

mux_out_writeEnableMatrixC_to_wr iteEnableMatr ixC  :  std_logic ; 
signal  read_address_matrixA_to_mux_inputl  ,  ... 

read_address_matr ixB_to_masklx5  :  std_logic_vector  (... 
address_width  -  1  downto  0) ; 

signal  write_address_matrixC_to_WriteAddressMatrix_mux_input 1  ,  . 

write_addres  s  _r educe hRe  suit  _t  o_mux_ input l_matrixA2_ address  : 
std_logic_vector  ( address_width  -  1  downto  0); 
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signal  read_address_hResult_to_mux_input l_readAddressMatrixC  ,  ... 

wr ite_address_matrixC_to_Wr iteAddressMatr ix_mux_input2  :  ... 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  wr it e_address_r educe vResult_to_mux_ input 2_matr ixA2_addr ess  , 
read_address_matr ixB_to_mask5xl  :  std_logic_vector  (... 
address_width  -  1  downto  0) ; 

signal  read_address_vResult_to_mux_input2_readAddressMatrixC  ,  ... 

wr ite_address_reducevResult_to_Matr ixReduce :  std_logic_vector  ( 
address_width  -  1  downto  0) ; 

signal  mux_ouput_to_matrixAl  , mux_output_to_matrixA2_address :  ... 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_address_matr ixC_to_address_matrixC  ,  ... 

mux_output_WriteAddressMatrixC_to_write_address_matrixC  :  ... 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  out put _mat r ix_ value s _t o_mux_ input  1 _matr ixA_value s  ,  ... 

output_matrix_values_to_mux_input2_matrixA_values  :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  mux.out put _matr ixA_value s_t o_ conv2_ input  1  ,  ... 

output .matrix. value s _t o_mux_ input  1 _matrixB_ value s  :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  output_matrix_values_to_mux_input2_matrixB_values  ,  ... 

mux _ out put  _mat r ixB_t  o_  conv2_ input2 :  std_logic_vector (data_width 
-  1  downto  0)  ; 

signal  output_matrixC_values_to_mux_input_values  ,  ... 

mux_output_MatrixC_values_to_matrixA2_values :  std_logic_vector ( 
data_width  -  1  downto  0) ; 

signal  mux_output_Matr ixC_values_to_matr ixReduce.values :  ... 

std_logic_vector ( data_width  -  1  downto  0); 
signal  output_data_matrixC_to_matrixC_values :  std_logic_vector ( .  .  . 
data_width  -  1  downto  0) ; 

signal  execute_to_mux_input4  ,  execute_to_mux_input8  ,  ... 

mux_out_execute_to_boothconv2 :  std_logic ; 
signal  mult_ready_to_mux_input 1 ,  mult_done_to_mux_input2 :  ... 

std_logi c ; 


begin 

hresultcontrolboothl  :  hresultcontrolbooth2  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable_conv2_module  , 

update_reg_enable  =>  update_reg_enable_to_mux_input 1  , 
writeEnable.matrixC  => 

control_writeEnable_memC_to_WriteEnableMatrixC_mux_input3  , 
reset.reg  =>  control_reset_reg_to_hvcontrol_reset_mux_input2  , 
enable.reduce  =>  enable_reduce_to_h.ResultReduceControl  , 
execute  =>  execute_to_mux_input4  , 

read_address_matrixA  =>  read_address_matrixA_to_mux_input 1  , 
read_address_matrixB  =>  read_address_matrixB_to_masklx5  , 
wr ite_address_matr ixC  => 

write_address_matrixC_to_WriteAddressMatrix_mux_inputl  , 
mux.enable  =>  mux_enable_to_mux_select )  ; 


323 

324 

325 

326 

327 

328 

329 

330 

331 

332 

333 

334 

335 

336 

337 

338 

339 

340 

341 

342 

343 

344 

345 

346 

347 

348 

349 

350 

351 

352 

353 

354 

355 

356 

357 

358 

359 

360 

361 

362 

363 

364 

365 

366 

367 

368 

369 

370 

371 

372 

373 

374 


hresultreducecontrolboothl :  hresultreducecontrolbooth2  port  map ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable_control  =>  enable_reduce_to_h.ResultReduceControl  , 
writeEnable_matrix  =>  writeEnable_matrix_to_MatrixA2  , 
enable_vresult  =>  enable_vresult_to_vResultControl  , 
read_address_hResult  => 

read_address_hResult_to_mux_ input l.readAddressMatrixC  , 
wr ite_address_reducehResult  => 

write_address_reducehResult_to_mux_ input l_matrixA2_address  , 
mux.enable  =>  mux_enable_to_mux_select_matrixA2 ) ; 


vresultcontrolboothl :  vresultcontrolbootb2  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable_vresult_to_vResultControl  , 
update_reg_enable  =>  update_reg_enable_to_mux_input5  , 
writeEnable_matrixC  =>  control_writeEnable_memC_to_mux_input7  , 
reset.reg  =>  control_reset_reg_to_mux_input6  , 

enable.reduce  =>  enable_reduce_to_vResultReduceControl_enable  , 
execute  =>  execute_to_mux_input8  , 
read.addr ess.matr ixA  => 

write_address_reducevResult_to_mux_input2_matrixA2_address  , 
read_address_matrixB  =>  read_address_matrixB_to_mask5xl  , 
wr ite_address_matr ixC  => 

wr it e_address_matrixC_to_Wr it e Address Matrix _mux_input2 )  ; 


vresultreducecontrolboothl  : vresultreducecontrolbooth2  port  map ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable.control  =>  enable_reduce_to_vResultReduceControl_enable  , 
writeEnable_matrix  =>  wr iteEnable_matr ix_to_MatrixReduce  , 
enable.vresult  =>  reduce_module_f inisbed  , 
read_address_vResult  => 

r ead_addr ess_vRe suit _to_mux_ input 2_readAddressMatrixC  , 
wr ite_address_reducevResult  => 

write_address_reducevResult_to_MatrixReduce)  ; 


mux2tol_matrixAl :  mux2tol  port  map(sel  =>  enable_matr ixA , 
inputl  =>  read_addr ess.matr ixA_t o_mux_ input  1  , 
input2  =>  read_address_matr ixA  , 
mux.output  =>  mux_ouput_to_matr ixAl )  ; 


mux2tol_matrixA_values :  mux2toldata_width  port  map ( 
sel  =>  mux_enable_to_mux_select  , 

inputl  =>  output_matrix_values_to_mux_input l_matrixA_values  , 
input2  =>  output_matrix_values_to_mux_input2_matrixA_values  , 
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mux_output  =>  mux_output _mat rixA_ value s_t o_ conv2_ input  1 )  ; 


mux2tol_matrixB_values :  mux2toldata_width  port  map ( 
sel  =>  mux_enable_to_mux_select  , 

inputl  =>  output_matrix_values_to_mux_input  l_matrixB_valu.es  , 
input2  =>  output_matrix_values_to_mux_input2_matrixB_values  , 
mux.output  =>  mux_output_matrixB_to_conv2_input2 )  ; 


mux8to4_Enablel :  mux8to4enable  port  map  ( 

sel  =>  mux_enable_to_mux_select  , 

inputl  =>  update_reg_enable_to_mux_input 1  , 

input2  =>  control_reset_reg_to_hvcontrol_reset_mux_input2  , 
input3  => 

control_writeEnable_memC_to_WriteEnableMatrixC_mux_input3  , 

input4  =>  execute_to_mux_input4  , 

input5  =>  update_reg_enable_to_mux_input5  , 

input6  =>  control_reset_reg_to_mux_input6  , 

input7  =>  control_writeEnable_memC_to_mux_input7  , 

input8  =>  execute_to_mux_input8 , 

mux.outputl  =>  mux_output_to_conv2_update_reg  , 
mux_output2  =>  mux_output_to_conv2_reset  , 

mux_output3  =>  mux_out_wr iteEnableMatrixC_to_writeEnableMatr ixC  , 
mux_output4  =>  mux_out_execute_to_bootbconv2 )  ; 


mux2to4_Enablel :  mux2to4enable  port  map 
sel  =>  mux_enable_to_mux_select  , 
inputl  =>  mult _ready_t o _mux_ input  1  , 
input2  =>  mult_done_to_mux_input2  , 
mux.outputl  =>  mult_ready_hresult  , 
mux_output2  =>  mult_done_hresult  , 
mux_output3  =>  mult_ready_vresult  , 
mux_output4  =>  mult_done_vresult )  ; 


( 


mux2tol_ReadAddressMatrixC :  mux2tol  port  map ( 
sel  =>  mux_enable_to_mux_select_matrixA2  , 

inputl  =>  read_address_bResult_to_mux_input l.readAddressMatr ixC  , 
input2  =>  read_address_vResult_to_mux_input2_readAddressMatr ixC  , 
mux.output  =>  read_address_matrixC_to_address_matrixC )  ; 


mux2t o 1 _Wr it e Addr e s sMat r ixC :  mux2tol  port  map ( 
sel  =>  mux_enable_to_mux_select  , 

inputl  =>  wr ite_address_matr ixC_to_Wr iteAddressMatr ix_mux_input 1  , 
input2  =>  wr ite_address_matr ixC_to_Wr iteAddressMatr ix_mux_input2  , 
mux.output  => 

mux_output_WriteAddressMatrixC_to_write_address_matrixC)  ; 
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muxlto2_matrixA2_ReduceMatrix_values :  muxlto2data_width  port  map ( 

sel  =>  mux_enable_to_mux_select_matr ixA2  , 

input  =>  output_matrixC_values_to_mux_input_values  , 

mux.outputl  =>  mux_output_MatrixC_values_to_matrixA2_values  , 

mux_output2  =>  mux_output_MatrixC_values_to_matrixReduce_values )  ; 


mem_matr ixA 1 :  mem.matrix  port  map  (elk  =>  elk,  reset  =>  reset, 

writeEnable  =>  enable.matr ixA , 

matr ix.values  =>  input_matr ixA_values  , 

write_address  =>  mux_ouput_to_matrixAl , 

read.address  =>  mux_ouput_to_matrixAl  , 

output.matr ix_values  => 

output .matrix. value s_t o_mux_ input l_matrixA_values)  ; 


mem_matrixA2 :  mem.matrix  port  map  (elk  =>  elk,  reset  =>  reset, 
writeEnable  =>  wr iteEnable.matr ix_to_Matr ixA2  , 

matr ix.values  =>  mux_output_Matr ixC_values_to_matr ixA2_values  , 
write_address  => 

wr it e_address_reducehResult_to_mux_ input l_matrixA2_address  , 
read.address  => 

write_address_reducevResult_to_mux_input2_matrixA2_address  , 
output.matr ix_values  => 

output .matrix. value s_t o_mux_ input2_matrixA_values)  ; 


masklx5mem_matrix0 :  masklx5mem_matr ix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

read.address  =>  read_address_matr ixB_to_masklx5  , 
output.matr ix_values  => 

output .matrix. value s_t o_mux_ input l_matrixB_values)  ; 


mask5xlmem_matrix0 :  mask5xlmem_matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

read.address  =>  read_address_matr ixB_to_mask5xl  , 
output.matr ix.values  => 

output .matrix. values _to_mux_ input 2_matr ixB_ values )  ; 


boothconv2_ 1 :  boothconv2  port  map  (elk  =>  elk, 
reset  =>  mux_output_to_conv2_reset  , 
execute  =>  mux_out_execute_to_boothconv2  , 
update_reg_enable  =>  mux_output_to_conv2_update_reg , 

input.matr ixA_values  =>  mux_output _mat r ixA_values_t o_c onv2_ input  1  , 
input.matr ixB.values  =>  mux_output_matrixB_to_conv2_input2  , 
output_data_matrixC  =>  output_data_matrixC_to_matrixC_values  , 
mult_ready  =>  mult_ready_to_mux_input 1  , 
mult.done  =>  mult_done_to_mux_input2 )  ; 
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mem_matrixC :  mem.matrix  port  map  (elk  =>  elk,  reset  =>  reset, 
writeEnable  =>  mux_out_wr iteEnableMatr ixC_to_writeEnableMatr ixC , 
matr ix.values  =>  output_data_matr ixC_to_matrixC_values  , 
write_address  => 

mux_output_WriteAddressMatrixC_to_write_address_matrixC  , 
read.address  =>  read_address_matr ixC_to_address_matr ixC  , 
output.matr ix_values  => 

output.matr ixC_ value s_to_mux_ input _ values )  ; 


mem.matrixReduce :  mem.matrix  port  map  (elk  =>  elk,  reset  =>  reset, 
writeEnable  =>  wr iteEnable_matrix_to_Matr ixReduce  , 

matr ix.values  =>  mux_output_Matr ixC_values_to_matr ixReduce.values  , 
write_address  =>  write_address_reducevResult_to_Matr ixReduce  , 
read.address  =>  read_address_matr ixReduce  , 
output.matr ix_values  =>  output_data_matr ixReduce )  ; 

end  structure_reduceconv2_modulebooth2  ; 


C.5  Compute  Derivatives  Fx  using  Booth  Multiplier 
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Listing  C.5:  The  FxComputeDerivativesBoothConv2Module.vhd  VHDL  file. 
(appendix3/FxComputeDerivativesBoothConv2Module.vhd) 


--  Capt .  Jason  Shirley 

--  This  is  the  FxComputeDerivatives  Module  using  the 

--  Booth  Multiplier . These  Modules  are  structually  connected 

library  ieee  ; 

use  ieee . std_logic_ 1 164  .  all  ; 

use  ieee . std_logic_unsigned . all 

use  ieee . std_logic_ar ith . all ; 

use  ieee . numeric.std . all ; 

entity  f xcomputederivativesboothconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer 

CN 

II 

port  (elk 

in  std_logic ; 

reset 

in  std_logic ; 

enable_matrix_A_B 

in  std_logic ; 

enable_conv2_module 

in  std_logic ; 

read_address_matrixA 

in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

read_address_matrixB 

in  std.logic.vector  (... 

address_width  -  1  downto  0) ; 
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read_address_matrixC  :  in  std_logic_vector  (... 
address_width  -  1  downto  0) ; 

in  std_logic_vector  (data_width 


input_matrixA_values 
1  downto  0)  ; 
input_matrixB_values 
1  downto  0)  ; 
output_data_matrixC 
1  downto  0)  ; 

enable_f xcomputeder ivat ive_ 
wr iteEnable_matr ix_conv2A  : 
wr iteEnable_matr ix_conv2B  : 
mult_ready_conv2A  :  out 

mult  _done_conv2A 
mult  _ready_conv2B 
mult_done_conv2B 
) ; 


in  std_logic_vector  (data_width 

out  std_logic_vector ( data_width 

complete  :  out  std_logic; 
out  std_logic ; 
out  std_logic ; 
std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic 


end  f xcomputederivativesboothconv2_module ; 


architecture  structure.f xcomputederivativesboothconv2_module  of 
f xcomputeder ivat i vesboothconv2_module  is 


component  boothconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk 

reset 

enable_matrixA 
enable_conv2_module 
read_address_matrixA 
address_width  -  1 
read_address_matrixC 
address_width  -  1 
input_matrixA_values 
1  downto  0)  ; 
output_data_matrixC 
1  downto  0)  ; 
enable.complete 
writeEnable_matrixC 
mult_ready 
mult_done 
) ; 

end  component  boothconv2_module ; 


:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (data_width 

:  out  std_logic_vector ( data_width 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic 


component  adder  is 
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generic  (data_width:  integer  :=  24); 

port  ( input_mult_data  :  in  std_logic_vector  (data_width  -  1  ... 

downto  0)  ; 

input_reg_data  :  in  std_logic_vector  (data_width  -  1  ... 

downto  0)  ; 

output.data  :  out  std_logic_vector  (data_width  -  1... 

downto  0) 

) ; 

end  component  adder ; 


component  f xf yf tcontroladder  is 

generic  ( address_width :  integer  :=  8;  --  address_width 

must  be  an  even  number 
nl :  integer  : =  5 ; 
n2 :  integer  : =  7)  ; 


--  address_width  must  be  an  even  number 

--  nl  and  n2  are  the  size  of  the  input  matrix  for  example  6x8 
--  matrix  starts  at  zero  so,  nl  =  (6-1)  =  5 ,  n2  =  (8-1)  =  7. 


port  (elk 

reset 

enable_control_conv2A 

enable_control_conv2B 

wr i t  eEn ab 1 e _mat r i x_ image  1 _ image 2 

enable _f xf y_ complete 

read_address_matrix_imagel 

( address_width  -  1  downto  0); 
read_address_matrix_image2 

( address_width  -  1  downto  0); 
write_address_matrix_imagel_image2 
( address_width  -  1  downto  0) 

) ; 


in  std_logic ; 

in  std_logic ; 

in  std_logic ; 

in  std_logic ; 

out  std_logic ; 

out  std_logic ; 

out  std_logi c.vector .  .  . 

out  std_logi c.vector .  .  . 

out  std_logi c.vector . .  . 


end  component  fxfyf tcontroladder ; 


component  mem.matrix  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk 

reset 

writeEnable 

matrix.values 

-  1  downt o  0)  ; 
write.address 

address_width  - 


:  in  std_logic  ; 

:  in  std_logic  ; 

:  in  std_logic  ; 

:  in  std_logic_vector  (data_width 

:  in  std_logic_vector  (... 

1  downto  0) ; 
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read.address  :  in  std_logic_vector  (... 

address_width  -  1  downto  0); 
output_matrix_valu.es  :  out  std_logic_vector ( data_width 
-  1  downto  0) 

)  ; 

end  component  mem_matr ix ; 


signal  enable_complete_to_enable_control_conv2A  ,  ... 

enable _complete_to_enable_conrol_conv2B :  std_logic; 
signal  writeEnable_matrix_imagel_image2_to_writeEnable  :  std_logic ; 
signal  mux_ouput_matrixA ,  mux_ouput_matrixB  : std_logic_vector  (... 

address_width  -  1  downto  0) ; 
signal  contr ol _matrixA_t o_muxA_ input  1  ,  ... 

control.matr ixB_to_muxB_input 1 : std_logic_vector  (  address_width. 

-  1  downto  0)  ; 

signal  write_address_matrix_imagel_image2_to_writeEnable :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_address_matrix_imagel_to_read_address_conv2A  ,  ... 

read_address_matrix_image2_to_read_address_conv2B  :  . .  . 
std_logic_vector  ( address_width  -  1  downto  0); 
signal  output_data_matrix_conv2A_to_input_mult_data  ,  ... 

output_data_matrix_conv2B_to_input_reg_data :  std_logic_vector ( . 
data_width  -  1  downto  0) ; 

signal  output_data_to_mem_matrix_input_values :  std_logic_vector ( .  . 
data_width  -  1  downto  0) ; 

begin 

boothconv2_module 1 :  boothconv2_module  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable_matrixA  =>  enable_matrix_A_B , 
enable_conv2_module  =>  enable_conv2_module , 
read.addr ess_matr ixA  =>  read_address_matr ixA , 
read_address_matrixC  => 

read_address_matrix_imagel_to_read_address_conv2A  , 
input.matr ixA_values  =>  input_matrixA_values  , 
output_data_matrixC  => 

output_data_matrix_conv2A_to_input_mult_data, 

enable_complete  =>  enable_complete_to_enable_control_conv2A  , 
writeEnable_matrixC  =>  writeEnable_matrix_conv2A  , 
mult_ready  =>  mult_ready_conv2A , 
mult.done  =>  mult_done_conv2A)  ; 

boothconv2_module2 :  boothconv2_module  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.matr ixA  =>  enable.matr ix_A_B , 
enable_conv2_module  =>  enable_conv2_module  , 
read_address_matrixA  =>  read_address_matr ixB , 
read_address_matrixC  => 
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read_address_matrix_image2_to_read_address_conv2B  , 
input_matr ixA_values  =>  input_matrixB_values  , 
output_data_matrixC  => 

output_data_matrix_conv2B_to_input_reg_data, 

enable_complete  =>  enable_complete_to_enable_conrol_conv2B  , 
writeEnable_matrixC  =>  writeEnable_matrix_conv2B  , 
mult_ready  =>  mult _ready_conv2B , 
mult_done  =>  mult_done_conv2B ) ; 

f xf yf tcontroladderO :  f xf yf tcontroladder  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable_control_conv2A  =>  enable_complete_to_enable_control_conv2A  , 
enable_control_conv2B  =>  enable_complete_to_enable_conrol_conv2B  , 
writeEnable_matrix_imagel_image2  => 
writeEnable_matrix_imagel_image2_to_writeEnable  , 
enable_f xf y_complete  =>  enable_f xcomputederivative_complete  , 
read_address_matrix_imagel  => 

read_address_matrix_imagel_to_read_address_conv2A  , 
read_address_matr ix_image2  => 

read_address_matrix_image2_to_read_address_conv2B  , 
write_address_matrix_imagel_image2  => 

write_address_matrix_imagel_image2_to_writeEnable)  ; 

mem.matrixC :  mem.matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 
writeEnable  => 

writeEnable_matrix_imagel_image2_to_writeEnable  , 

matr ix.values  =>  output_data_to_mem_matrix_input_values  , 

write_address  => 

write_address_matrix_imagel_image2_to_writeEnable  , 
read.address  =>  read_address_matrixC , 
output_matr ix_values  =>  output_data_matrixC) ; 

adderl:  adder  port  map  ( input_mult_data  => 
output_data_matrix_conv2A_to_input_mult_data, 

input_reg_data  =>  output_data_matr ix_conv2B_to_input_reg_data  , 
output_data  =>  output_data_to_mem_matrix_input_values )  ; 

end  structure_f  xcomput eder ivat ivesboothconv2_module ; 
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C.6  Compute  Derivatives  Fy  using  Booth  Multiplier 

Listing  C.6:  The  FyComputeDerivativesBoothConv2Module.vhd  VHDL  file. 

(appendix3/FyComputeDerivativesBoothConv2Modulc.vhd) 

--  Capt .  Jason  Shirley 

--  This  is  the  FyComputeDerivatives  Module  using  the 
--  Booth  Multiplier.  These  Modules  are  structually  connected 


library  ieee  ; 

use  ieee . st d_logi c_ 1 1 64  .  all  ; 
use  ieee . std_logi c_uns igned  .  all  ; 
use  ieee . std_logic_ar ith . all ; 
use  ieee . numeric_std . all ; 


entity  f ycomputederivativesboothconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port 


(elk 

reset 

enable_matrix_A_B 
enable_conv2_module 
read  address  matrixA 


in 
in 
in 
in 
in 

address  width  -  1  downto 


read  address  matrixB 


in 


address  width  -  1  downto 


m 


read_address_matrixC 

address_width  -  1  downto 
input_matrixA_values  :  in 
1  downto  0)  ; 

input_matrixB_values  :  in 
1  downto  0)  ; 
output _data_matrixC 
1  downto  0)  ; 

enable_f y computeder ivat ive 

writeEnable_matrix_conv2A 

writeEnable_matrix_conv2B 


std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (data_width 


in  std_logic_vector  (data_width 
out  std_logic_vector ( data_width 


mult  _ready_conv2A 
mult_done_conv2A 
mult  _ready_conv2B 
mult_done_conv2B 
) : 


complete  :  out  std_logic; 
out  std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic 


end  f ycomputederivativesboothconv2_module  ; 
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architecture  structure.f ycomputederivativesboothconv2_module  of 
fycomputederivat i vesboothconv2_module  is 


component  boothconv2_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk 

reset 

enable_matrixA 
enable_conv2_module 
read_address_matrixA 
address_width  -  1 
read_address_matrixC 
address_width  -  1 
input_matrixA_values 
1  downto  0)  ; 
output_data_matrixC 
1  downto  0)  ; 
enable_complete 
writeEnable_matrixC 
mult_ready 
mult_done 
) ; 

end  component  boothconv2_module ; 


:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (data_width 

:  out  std_logic_vector ( data_width 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic 


component  adder  is 

generic  (data_width:  integer  :=  24); 

port  ( input_mult_data  :  in  std_logic_vector  (data_width  -  1  ... 

downto  0)  ; 

input_reg_data  :  in  std_logic_vector  (data_width  -  1  ... 

downto  0)  ; 

output.data  :  out  std_logic_vector  (data_width  -  1... 

downto  0) 

)  ; 

end  component  adder ; 


component  f xf yf tcontroladder  is 

generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 
n2 :  integer  : =  7)  ; 

--  address_width  must  be  an  even  number 

--  nl  and  n2  are  the  size  of  the  input  matrix  for  example  6x8 
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--  matrix  starts  at  zero  so,  nl  =  (6-1)  =  5 ,  n2  =  (8-1)  =  7. 


port  (elk 

reset 

enable_control_conv2A 

enable_control_conv2B 

wr i t  eEn ab 1 e _mat r i x_ image  1 _ image 2 

enable _fxfy_ complete 

read_address_matrix_imagel 

( address_width  -  1  downto  0); 
read_address_matrix_image2 

( address_width  -  1  downto  0); 
write_address_matrix_imagel_image2 
( address_width  -  1  downto  0) 

) ; 


in  std_logic ; 

in  std_logic ; 

in  std_logic ; 

in  std_logic ; 

out  std_logic ; 

out  std_logic ; 

out  std_logic_vector 

out  std_logic_vector 

out  std_logi c.vector 


end  component  f xf yf t contr oladder ; 


component  mem.matrix  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk 

reset 

writeEnable 

matrix_values 

data_width  - 
write.address 

address_width 
read.address 

address_width  -  1 
output_matrix_values 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 

:  in  std_logic_vector  (. 
downto  0) ; 

:  in  std_logic_vector  (. 
downto  0)  ; 

:  in  std_logic_vector  (. 
downto  0)  ; 

:  out  std_logic_vector  ( . 


-  1 


data_width  -  1  downto  0) 


) ; 


end  component  mem_matr ix ; 


signal  enable_complete_to_enable_control_conv2A  ,  ... 

enable _complete_to_enable_conrol_conv2B :  std_logic; 
signal  writeEnable_matrix_imagel_image2_to_writeEnable :  std_logic ; 
signal  mux_ouput_matrixA ,  mux_ouput_matrixB  : std_logic_vector  (... 

address_width  -  1  downto  0) ; 
signal  contr ol _matrixA_t o_muxA_ input  1  ,  ... 

control.matr ixB_to_muxB_input 1 : std_logic_vector  ( address_width 
-  1  downto  0)  ; 

signal  wr ite_address_matr ix_image l_image2_to_writeEnable :  .  .  . 
std_logic_vector  ( address_width  -  1  downto  0); 
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signal  read_address_matrix_imagel_to_read_address_conv2A  ,  ... 

read_address_matrix_image2_to_read_address_conv2B  :  . .  . 
std_logic_vector  ( address_width  -  1  downto  0); 
signal  output_data_matrix_conv2A_to_input_mult_data  ,  ... 

output_data_matrix_conv2B_to_input_reg_data :  std_logic_vector ( . .  . 
data_width  -  1  downto  0) ; 

signal  output_data_to_mem_matrix_input_values :  std_logic_vector (  .  .  . 
data_width  -  1  downto  0) ; 

begin 

boothconv2_module 1 :  boothconv2_module  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.matrixA  =>  enable_matrix_A_B , 
enable_conv2_module  =>  enable_conv2_module  , 
read_address_matrixA  =>  read_address_matr ixA , 
read_address_matrixC  => 

read_address_matrix_imagel_to_read_address_conv2A  , 
input.matr ixA_values  =>  input_matrixA_values  , 
output_data_matrixC  => 

output_data_matrix_conv2A_to_input_mult_data, 

enable.complete  =>  enable_complete_to_enable_control_conv2A  , 
writeEnable.matrixC  =>  writeEnable_matrix_conv2A , 
mult_ready  =>  mult_ready_conv2A , 
mult_done  =>  mult_done_conv2A) ; 

boothconv2_module2 :  boothconv2_module  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.matr ixA  =>  enable.matr ix_A_B , 
enable_conv2_module  =>  enable_conv2_module  , 
read.addr ess_matr ixA  =>  read_address_matr ixB , 
read.addr ess_matr ixC  => 

read_address_matrix_image2_to_read_address_conv2B  , 
input_matr ixA_values  =>  input_matrixB_values  , 
output_data_matrixC  => 

output_data_matrix_conv2B_to_input_reg_data, 

enable.complete  =>  enable_complete_to_enable_conrol_conv2B  , 
writeEnable.matrixC  =>  writeEnable_matrix_conv2B , 
mult_ready  =>  mult _ready_conv2B  , 
mult_done  =>  mult_done_conv2B ) ; 

f xf yf tcontroladderO :  f xf yf tcontroladder  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable_control_conv2A  =>  enable_complete_to_enable_control_conv2A  , 
enable_control_conv2B  =>  enable_complete_to_enable_conrol_conv2B , 
writeEnable_matrix_imagel_image2  => 
writeEnable_matrix_imagel_image2_to_writeEnable  , 
enable_f xf y_complete  =>  enable_f ycomputederivative_complete  , 
read_address_matrix_imagel  => 

read_address_matrix_imagel_to_read_address_conv2A  , 
read_address_matr ix_image2  => 

read_address_matrix_image2_to_read_address_conv2B  , 
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write_address_matrix_imagel_image2  => 

write_address_matrix_imagel_image2_to_writeEnable)  ; 


mem.matrixC :  mem.matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 
writeEnable  => 

writeEnable_matrix_imagel_image2_to_writeEnable  , 

matr ix.values  =>  output_data_to_mem_matrix_input_values 

write_address  => 

write_address_matrix_imagel_image2_to_writeEnable  , 
read.address  =>  read_address_matrixC , 
output.matr ix_values  =>  output_data_matrixC) ; 


adderl:  adder  port  map  ( 

input _mult _dat a  =>  output_data_matrix_conv2A_to_input_mult_data , 
input _r eg_dat a  =>  output_data_matrix_conv2B_to_input_reg_data , 
output.data  =>  output_data_to_mem_matrix_input_values )  ; 


end  structure.f y compute deri vat ivesboothconv2_module  ; 


(7.7  Compute  Derivatives  Ft  using  Booth  Multiplier 


Listing  C.7:  The  FtComputeDerivativesBoothConv2Module.vhd  VHDL  file. 

(appendix3/FtComputeDerivativesBoothConv2Module.vhd) 


1 

--  Capt .  Jason  Shirley 

2 

--  This  is  the  FtComputeDerivatives  Module  using  the 

3 

--  Booth  Multiplier.  These  Modules  are  structually  connected 

4 

5 

6 

library  ieee  ; 

7 

use  ieee . std_logic_ 1 164  .  all  ; 

8 

use  ieee . std_logic_unsigned . all 

9 

use  ieee . std_logic_ar ith . all ; 

10 

11 

12 

use  ieee . numeric.std . all ; 

13 

14 

entity  f tcomputederivativesboothconv2_module  is 

15 

16 

generic  ( address_width :  integer  :=  8; 

17 

data_width:  integer 

CN 

II 

18 

19 

port  (elk 

in  std_logic ; 

20 

reset 

in  std_logic ; 

21 

enable_matrix_A_B 

in  std_logic ; 

22 

enable_conv2_module 

in  std_logic ; 

23 

read_address_matrixA 

in  std_logic_vector  (... 

address_width  -  1  downto  0) ; 

24 

read_address_matrixB 

in  std.logic.vector  (... 

address_width  -  1  downto  0) ; 
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read_address_matrixC  :  in  std_logic_vector  (... 
address_width  -  1  downto  0) ; 

in  std_logic_vector  (data_width 


input_matrixA_values 
1  downto  0)  ; 
input_matrixB_values 
1  downto  0)  ; 
output_data_matrixC 
1  downto  0)  ; 

enable_f t computeder ivat ive_ 
wr iteEnable_matr ix_conv2A  : 
wr iteEnable_matr ix_conv2B  : 
mult_ready_conv2A  :  out 

mult  _done_conv2A 
mult  _ready_conv2B 
mult_done_conv2B 
)  ; 


in  std_logic_vector  (data_width 

out  std_logic_vector ( data_width 

complete  :  out  std_logic; 
out  std_logic ; 
out  std_logic ; 
std_logic ; 
out  std_logic ; 
out  std_logic ; 
out  std_logic 


end  f tcomputederivativesboothconv2_module ; 


architecture  structure.f tcomputederivativesboothconv2_module  of 
f t computeder ivat ivesboo the on v2_module  is 


component  boothconv2_ image  1 .module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk 

reset 

enable_matrixA 
enable_conv2_module 
read_address_matrixA 
address_width  -  1 
read_address_matrixC 
address_width  -  1 
input_matrixA_values 
1  downto  0)  ; 
output_data_matrixC 
1  downto  0)  ; 
enable.complete 
writeEnable_matrixC 
mult_ready 
mult_done 
)  ; 

end 


:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (data_width 

:  out  std_logic_vector ( data_width 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic ; 

:  out  std_logic 


component  boothconv2_imagel_module ; 


component  boothconv2_image2_module  is 
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generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port 


(elk  :  in 

reset  :  in 

enable_matrixA  :  in 

enable_conv2_module  :  in 
read_address_matrixA  :  in 
address_width  -  1  downto 
read_address_matrixC  :  in 
address_width  -  1  downto 
input_matrixA_valu.es  :  in 
1  downto  0)  ; 

output_data_matrixC  :  out 
1  downto  0)  ; 

enable_complete  :  out 

wr iteEnable_matr ixC  :  out 
mult.ready  :  out 

mult.done  :  out 

)  ; 


std_logic ; 
std_logic ; 
std_logic ; 
std_logic ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (... 

0)  ; 

std_logic_vector  (data_width 

std_logic_vector (data_width 

std_logic ; 
std_logic ; 
std_logic ; 
std_logic 


end  component  boothconv2_image2_module ; 


component  adder  is 

generic  (data_width:  integer  :=  24); 

port  ( input_mult_data  :  in  std_logic_vector  (data_width  -  1  ... 

downto  0)  ; 

input_reg_data  :  in  std_logic_vector  (data_width  -  1  ... 

downto  0)  ; 

output.data  :  out  std_logic_vector  (data_width  -  1... 

downto  0) 

)  ; 

end  component  adder ; 


component  f xf yf tcontroladder  is 

generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 
n2 :  integer  : =  7)  ; 

--  address_width  must  be  an  even  number 

--  nl  and  n2  are  the  size  of  the  input  matrix  for  example  6x8 
--  matrix  starts  at  zero  so,  nl  =  (6-1)  =  5 ,  n2  =  (8-1)  =  7. 


port  (elk 

reset 

enable_control_conv2A 

enable_control_conv2B 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
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141 
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wr i t  eEn ab 1 e _mat r i x_ image  1 _ image 2 
enable _fxfy_ complete 
read_address_matrix_imagel 

( address_width  -  1  downto  0); 
read_address_matrix_image2 

( address_width  -  1  downto  0); 
write_address_matrix_imagel_image2 
( address_width  -  1  downto  0) 


out  std_logic ; 
out  std_logic ; 
out  std_logi c.vector 

out  std_logic_vector 

out  std_logic_vector 


end  component  f xf yf t contr oladder ; 


component  mem.matrix  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk  : 

reset  : 

writeEnable  : 

matrix_values 

-  1  downto  0)  ; 
wr ite.address 

address_width  -  1 
read.address 

address_width  -  1 
output_matrix_values 

-  1  downto  0) 

)  ; 

end  component  mem.matrix; 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 

:  in  std_logic_vector  (data_width 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  out  std_logic_vector ( data_width 


signal  enable_complete_to_enable_control_conv2A  ,  ... 

enable _complete_to_enable_conrol_conv2B :  std_logic; 
signal  writeEnable_matrix_imagel_image2_to_writeEnable :  std_logic ; 
signal  mux_ouput_matrixA ,  mux_ouput_matrixB  : std_logic_vector  (... 

address_width  -  1  downto  0) ; 
signal  contr ol _matrixA_t o_muxA_ input  1  ,  ... 

control.matr ixB_to_muxB_input 1 : std_logic_vector  ( address_width 
-  1  downto  0)  ; 

signal  wr i t e_addr e s s_mat r ix_ image  1 _ image2_t o_wr i t eEnable :  .  .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  read_address_matrix_imagel_to_read_address_conv2A  ,  ... 

read_address_matrix_image2_to_read_address_conv2B  :  . .  . 
std_logic_vector  ( address_width  -  1  downto  0); 
signal  output_data_matrix_conv2A_to_input_mult_data  ,  ... 

output_data_matrix_conv2B_to_input_reg_data :  std_logic_vector ( . 
data_width  -  1  downto  0) ; 

signal  output_data_to_mem_matrix_input_values  :  std_logic_vector ( . . 
data_width  -  1  downto  0) ; 
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begin 


boothconv2_imagel_modulel :  boothconv2_imagel_module  port  map  ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable.matr ixA  =>  enable.matr ix_A_B , 
enable_conv2_module  =>  enable_conv2_module  , 
read_address_matrixA  =>  read_address_matr ixA , 
read.addr ess_matr ixC  => 

read_address_matrix_imagel_to_read_address_conv2A  , 
input_matr ixA.values  =>  input_matrixA_values  , 
output_data_matrixC  => 

output_data_matrix_conv2A_to_input_mult_data, 

enable.complete  =>  enable_complete_to_enable_control_conv2A  , 
writeEnable_matrixC  =>  writeEnable_iatrix_conv2A , 
mult_ready  =>  mult _ready_conv2A , 
mult_done  =>  mult_done_conv2A) ; 

boothconv2_image2_module2 :  boothconv2_ image2_module  port  map  ( 
elk  =>  elk  , 
reset  =>  reset  , 

enable.matrixA  =>  enable_matrix_A_B  , 
enable_conv2_module  =>  enable_conv2_module  , 
read.addr ess_matr ixA  =>  read_address_matr ixB , 
read_address_matrixC  => 

read_address_matrix_image2_to_read_address_conv2B  , 
input_matr ixA_values  =>  input_matrixB_values  , 
output_data_matrixC  => 

output_data_matrix_conv2B_to_input_reg_data, 

enable.complete  =>  enable_complete_to_enable_conrol_conv2B  , 
writeEnable_matrixC  =>  writeEnable_matrix_conv2B , 
mult_ready  =>  mult_ready_conv2B , 
mult_done  =>  mult_done_conv2B ) ; 

f xf yf tcontroladderO :  f xf yf tcontroladder  port  map  (elk  =>  elk, 
reset  =>  reset  , 
enable_control_conv2A  => 

enable.complet e_to_enable_contr ol_conv2A  , 
enable_control_conv2B  => 

enable_complet e_t o_enable_conr ol_conv2B  , 
writeEnable_matrix_imagel_image2  => 
writeEnable_matrix_imagel_image2_to_writeEnable  , 
enable_f xf y_complete  =>  enable_f tcomputederivative_complete  , 
read_address_matrix_imagel  => 

read_address_matrix_imagel_to_read_address_conv2A  , 
read_address_matr ix_image2  => 

read_address_matrix_image2_to_read_address_conv2B  , 
write_address_matrix_imagel_image2  => 
write_address_matrix_imagel_image2_to_writeEnable)  ; 

mem.matrixC :  mem_matrix  port  map  (elk  =>  elk,  reset  =>  reset, 
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writeEnable  =>  writeEnable_matrix_imagel_image2_to_writeEnable  , 
matr ix_values  =>  output_data_to_mem_matrix_input_values  , 
write_address  => 

write_address_matrix_imagel_image2_to_writeEnable  , 
read.address  =>  read_address_matr ixC  , 
output.matr ix_values  =>  output_data_matrixC) ; 

adderl:  adder  port  map  (  input  _mu.lt  _dat  a  => 
output_data_matrix_conv2A_to_input_mult_data, 

input_reg_data  =>  output_data_matr ix_conv2B_to_input_reg_data  , 
output_data  =>  output_data_to_mem_matr ix_input_values )  ; 

end  structure _ft comput eder ivat ivesboothconv2_module  ; 


C.8  Matrix  Transpose 
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Listing  C.8:  The  TransposeMatrixModule.vhd  VHDL  file. 

(appendix3/TransposeMatrixModule.vhd) 

--  Capt .  Jason  Shirley 

--  This  is  the  Transpose  Matrix  Module 
--  These  Modules  are  structually  connected 

library  ieee  ; 

use  ieee . std_logic_ 1 164  .  all  ; 
use  ieee . std_logi c_uns igned  .  all  ; 
use  ieee . std_logic_ar ith . all ; 
use  ieee . numeric_std . all ; 


entity  transpose_matrix_module  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 

port  (elk  : 

reset  : 

enable.transpose  : 

enable.matr ix_original  : 

read_address_matr ix_or iginal  : 

address_width  -  1  downto  0) ; 
input.matr ix_or iginal_values  : 

data_width  -  1  downto  0) ; 
r ead_addr ess_matr ix.transpose  : 

address_width  -  1  downto  0) ; 
output_data_matr ix.transpose  : 

data_width  -  1  downto  0) ; 
enable_transpose_complete  : 

)  ; 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
in  std_logic ; 
in  std_logic_vector  (... 

in  std_logic_vector  (... 

in  std_logic_vector  (... 

out  std_logic_vector ( .  .  . 

out  std_logic 
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end  transpose.matr ix.module ; 


architecture  structure_transpose_matrix_module  of 
transpose_matrix_module  is 


component  transposecontrol  is 


generic  ( address_width :  integer  :=  8; 
nl :  integer  : =  5 ; 
n2 :  integer  : =  7)  ; 

--  address_width  must  be  an  even  number 

--  nl  and  n2  are  the  size  of  the  input  matrix  for  example  6x8 
--  matrix  starts  at  zero  so,  nl  =  (6-1)  =  5 ,  n2  =  (8-1)  =  7. 


port  (elk 

reset 

enable.control 

writeEnable_matrix_transpose 

enable_transpose_complete 

read_address_matrix_original 

address_width  -  1  downto  0) ; 
write_address_matrix_transpose 
address_width  -  1  downto  0) 

) ; 


in  std_logic ; 

in  std_logic ; 

in  std_logic ; 

out  std_logic ; 

out  std_logic ; 

out  std_logic_vector  (... 

out  std_logic_vector  (... 


end  component  transposecontrol; 


component  mem.matrix  is 

generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  24); 


port  (elk  : 

reset  : 

writeEnable  : 

matrix.values 

-  1  downto  0)  ; 
write.address 

address_width  -  1 
read.address 

address_width  -  1 
output_matrix_values 

-  1  downto  0) 

)  ; 

end  component  mem_matrix; 


in  std_logic ; 
in  std_logic ; 
in  std_logic ; 

:  in  std_logic_vector  (data_width 

:  in  std_logic_vector  (... 
downto  0) ; 

:  in  std_logic_vector  (... 
downto  0) ; 

:  out  std_logic_vector ( data_width 
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component  mux2tol  is 


generic  ( address_width :  integer  :=  8); 


port ( sel 


in  std_logic  ; 

in  std_logic_vector ( address_width  -  1  downto 


input  1  : 

0)  ; 

input2  : 

0)  ; 

mux_output  : 
downto  0) 


in  std_logic_vector ( address_width  -  1  downto 


out  std_logic_vector ( address_width  -  1 


) ; 

end  component  mux2tol; 


signal  writeEnable_matrix_transpose_to_mem_matrix_transpose : 
std_logic ; 

signal  c  ont r ol_mat r ix_or iginal_t  o  _mux_ input l:std_logic_vector  (... 
address_width  -  1  downto  0) ; 

signal  control_address_transpose_to_transpose_matrix  :  . .  . 

std_logic_vector  ( address_width  -  1  downto  0); 
signal  mux_output_to_original_matrix : std_logic_vector  (... 
address_width  -  1  downto  0) ; 

signal  output.matr ix_or iginal_values_to_matr ix_transpose_values : 
std_logic_vector ( data_width  -  1  downto  0); 


begin 


transposecontroll :  transposecontrol  port  map  (elk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable.transpose , 
wr iteEnable_matr ix.transpose  => 

writeEnable_matrix_transpose_to_mem_matrix_transpose  , 
enable_transpose_complete  =>  enable_transpose_complete  , 
read_address_matr ix_or iginal  => 
control_matr ix_original_to_mux_ input  1  , 
write_address_matrix_transpose  => 
control_address_transpose_to_transpose_matrix)  ; 

mem.matr ix_orginal :  mem_matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

writeEnable  =>  enable_matr ix_or iginal  , 
matr ix_values  =>  input.matr ix_original_values  , 
write.address  =>  mux_output_to_original_matrix  , 
read.address  =>  mux_output_to_or iginal_matrix  , 
output.matr ix_values  => 

output_matrix_original_values_to_matrix_transpose_values)  ; 
mem_matrix_transpose :  mem_matrix  port  map  (elk  =>  elk, 
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120 
121 
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124 

125 

126 

127 

128 

129 

130 

131 

132 

133 


reset  =>  reset  , 
writeEnable  => 

writeEnable_matrix_transpose_to_mem_matrix_transpose  , 
matr ix.values  => 

output _matrix_original_values_to_m at rix.tr an spose.values  , 
write_address  =>  control_address_transpose_to_transpose_matr ix  , 
read.address  =>  read_address_matr ix_transpose  , 
output.matr ix_values  =>  output_data_matrix_transpose )  ; 

mux2tol_matrix_orginal :  mux2tol  port  map  ( 
sel  =>  enable.matr ix_or iginal , 

inputl  =>  control_matrix_or iginal_to_mux_input 1  , 
input2  =>  read_address_matrix_original  , 
mux.output  =>  mux_output_to_original_matrix)  ; 

end  structure_transpose_matr ix_module ; 


C.9  Pseudoinverse 


Listing  C.9:  The  PseudoinverseModule.vhd  VHDL  file. 

(appendix3/PseudoinverseModule.vhd) 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 


25 


--  Capt .  Jason  Shirley 

--  This  is  the  PseudoinverseModule 


library  ieee  ; 

use  ieee . std_logic_ 1 164  .  all  ; 
use  ieee . std_logi c_uns igned  .  all  ; 
use  ieee . numeric_std . all ; 

--use  ieee . std_logic_signed . all ; 
--use  ieee . std_logic_arith . all ; 


entity  xresultbooth.module  is 


16 

generic  ( address_width :  integer 

:  = 

8; 

17 

18 

data_width:  integer  := 

32)  ; 

19 

port  (elk  : 

in 

std_logic ; 

20 

reset  : 

in 

std_logic ; 

21 

enable_matr ixAt A  : 

value 

in 

std_logic ; 

--  load  A  matrix 

22 

enable_At A.module  : 

in 

std_logic ; 

--  start  AtA 

23 

enable_matrixBvector  : 

value 

in 

std_logic ; 

--  load  B  matrix 

24 

read_address_matrixA  : 

in 

signed  ( address_width  -  1  . 

downto  0)  ; 

read_address_matrixBvectors 
1  downto  0)  ; 


in  signed  ( address_width 
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26 

27 

28 

29 

30 

31 

32 

33 

34 

35 

36 

37 

38 


50 

51 

52 

53 

54 

55 

56 

57 

58 

59 

60 
61 
62 


read_address_matrixX  :  in  signed  ( address_width  -1  ... 

downto  0)  ; 

input_matrixA_valu.es  :  in  signed  (data_width  -  1  downto 

0)  ; 

input_matr ixBvectors_values  :  in  signed ( data_width  -  1  .. 

downto  0)  ; 

output_data_matrixX  :  out  signed (data_width  -  1  downto 

0)  ; 

enable_f inished  :  out  std_logic 

) ; 


end  xresultbooth.module ; 


architecture  structure_xresultbooth_module  of  xresultbooth.module 
i  s 


39 

40 

component  yr esultbooth_module  is 

41 

generic  ( address_width :  integer 

:  = 

8; 

42 

43 

data_width:  integer  := 

32)  ; 

44 

port  (elk  : 

in 

std_logic ; 

45 

reset  : 

in 

std_logic ; 

46 

enable_matr ixAt A  : 

value 

in 

std_logic ; 

--  load  A  matrix 

47 

enable_At A.module  : 

in 

std_logic ; 

--  start  AtA 

48 

enable_matrixBvector  : 

value 

in 

std_logic ; 

--  load  B  matrix 

49 

read_address_matrixA  : 

in 

signed  ( address_width  -  1  . 

downto  0)  ; 

read_address_matr ixBvectors  :  in  signed  ( address_width  - 
1  downto  0)  ; 

read_address_matrixU  :  in  signed  ( address_width  -  1  ... 

downto  0)  ; 

read_address_matrixY  :  in  signed  ( address_width  -  1  ... 

downto  0)  ; 

input_matrixA_values  :  in  signed  (data_width  -  1  downto 

0)  ; 

input_matr ixBvectors_values  :  in  signed ( data_width  -  1  .. 

downto  0)  ; 

output_data_matrixY  :  out  signed ( data_width  -  1  downto 

0)  ; 

output_data_matrixU  :  out  signed ( data_width  -  1  downto 

0)  ; 

enable.f inished  :  out  std_logic 

) ; 

end  component  yresultbooth_module  ; 
component  xcontrolbooth  is 
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63 

64 

65 

66 

67 

68 

69 

70 

71 

72 

73 

74 

75 

76 

77 

78 

79 

80 

81 

82 

83 

84 

85 

86 

87 

88 

89 

90 

91 

92 

93 

94 

95 

96 

97 

98 

99 

100 

101 

102 

103 

104 

105 

106 

107 

108 

109 

110 


generic  ( address_width :  integer  :=  8; 

data_width:  integer  :=  32); 

port  (elk  :  in  std_logic; 

reset  :  in  std_logic; 

enable.control  :  in  std_logic; 

enable_predivider_register  :  out  std_logic; 
enable_postdivider_register :  out  std_logic; 


enable .checkneg 

out  std_logic ; 

enable.divider 

out  std_logic ; 

mux_load_divider 

out  std_logic ; 

muxU 

out 

std_logic ; 

muxUXY 

out 

signed  (1  downto  0); 

writeEnable_matrixX 

out 

std_logic ; 

enable_finished 

out 

std_logic ; 

execute 

out 

std_logic ; 

read_address_matrixU 

out 

signed  ( address_width  - 

1 

downto  0)  ; 

write_address_matrixX 

out 

signed  ( address_width  - 

1 

downto  0) ; 

read_address_matrixY 

out 

signed  ( address_width  - 

1 

downto  0) 


) ; 

end  component  xcontrolbooth ; 


component  reciprocal_top  Is 

Generic  (h.igh_bit  :  natural  :=  31;  f  r  act  ion_s  ize  :  natural  :  = 
16)  ; 

Port(clk,  reset,  load,  mux.control  :  In  std_logic; 
data_in  :  In  signed (high_bit  Downto  0);  --signed 
data_out  :  Out  signed (high_bit  Downto  0);  --signed 
overflow  :  Out  std_logic); 
end  component  reciprocal.top ; 


component  mux2to ldata_width  is 


generic  ( data_width :  integer  :=  32)  ; 


port  ( sel 

input  1 
input2 
mux.output 
) ; 


in  std_logic ; 

in  signed ( data_width  -  1  downto  0)  ; 
in  signed ( data_width  -  1  downto  0)  ; 
out  signed ( data_width  -  1  downto  0) 


end  component  mux2toldata_width ; 


component  mux3to ldata_width  is 
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Ill 

112 

113 

114 

115 

116 

117 

118 

119 

120 

121 

122 

123 

124 

125 

126 

127 

128 

129 

130 

131 

132 

133 

134 

135 

136 

137 

138 

139 

140 

141 

142 

143 

144 

145 

146 

147 

148 

149 

150 

151 

152 

153 

154 

155 


generic  ( data_width : 


integer 


32)  ; 


sel 

in 

signed 

( 1  downto  0)  ; 

input  1 

in 

signed 

(data_width  - 

1 

downt  o 

0) 

input  2 

in 

signed 

(data_width  - 

1 

downt  o 

0) 

input3 

in 

signed 

(data_width  - 

1 

downt  o 

0) 

mux.output 

out 

signed 

( data_width 

- 

1  downto 

0 

)  ; 


end  component  mux3toldata_width ; 


component  mem.matrix  is 

generic  ( address_width :  integer 

:=  8; 

data_width:  integer  :  = 

32)  ; 

port  (elk 

in 

std_logic ; 

reset 

in 

std_logic ; 

writeEnable 

in 

std_logic ; 

matrix.values 

in 

signed  (data_width  -  1 

downt  o 

0)  ; 

write.address 

in 

signed  ( address_width 

-  1  .  .  . 

downto  0)  ; 
read.address 

in 

signed  ( address_width 

-  1  .  .  . 

downto  0) ; 

output  _m at rix.values 

out 

s igned ( dat a_width  -  1 

downt  o 

0) 

) ; 

end  component  mem.matrix; 

component  booth2bit  is 


generic  (data_width: 

shift_size : 


integer  :=  32; 
integer  :  =  8)  ; 


port 


(elk 

in 

std_logic ; 

reset 

in 

std_logic ; 

exe  cut  e 

in 

std_logic ; 

input_matrixA_values 

in 

signed ( data_width  - 

1  downto  0) 

i 

input_matrixB_values 

in 

signed ( data_width  - 

1  downto  0) 

i 

mult_ready 

out 

std_logic ; 

booth2bit_output 

0)  ; 

out 

signed (data_width 

-  1  downto  . 

mult.done 
) ; 

out 

std_logic 
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156 

157 

158 

159 

160 

161 

162 

163 

164 

165 

166 

167 

168 

169 

170 

171 

172 

173 

174 

175 

176 

177 

178 

179 

180 

181 

182 

183 

184 

185 

186 

187 

188 

189 

190 

191 

192 

193 

194 

195 

196 

197 

198 

199 

200 

201 


end  component  booth2bit; 


component  checkneg  is 


generic 

( dat  a 

.width 

:  integer  := 

32)  ; 

(elk 

:  in 

std_logic ; 

enable 

:  in 

std_logic ; 

matrix 

.value 

:  in 

signed  (data 

.width  - 

1 

downto 

0) 

output 

.value 

:  out 

signed ( dat  a 

.width  - 

1 

downto 

0) 

) ; 

end  component  checkneg; 


component  adder  is 


generic  (data.width: 

integer  : =  32) ; 

( input.mult.data  : 

in  signed  (data.width  -  1  downto  0) ; 

input.reg.data 

:  in  signed  (data.width  -  1  downto 

0)  ; 

output.dat  a 

:  out  signed  (data.width  -  1  downto 

0) 

) ; 


end  component  adder  ; 


component  dff  is 


generic  (dat 

a.width : 

(elk  : 

in 

std.lo 

enable  : 

in 

std.lo 

input.data  : 

in 

signed 

output.data  : 

out 

s  igne 

) ; 

end  component  dff; 


integer  : =  32) ; 

gic  ; 
gic  ; 

(data_width  -  1  downto  0) ; 
(data_width  -  1  downto  0) 


signal  enable_finished_to_xcontrol  ,  .  . . 

enable_predividercontrol_to_predivider_df f  :  std_logic; 
signal  enable_postdivider_registercontrol_to_postdivider_df f  :  . 

std_logic ; 

signal  enable_divider_to_load ,  mux_load_divider_to_mux_control : 
std_logic ; 

signal  execute_to_execute  ,  muxU_to_muxU  ,  ... 

writeEnable_matrix_to_write_memoryX :  std_logic ; 
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202 

203 

204 

205 

206 

207 

208 

209 

210 

211 

212 

213 

214 

215 

216 

217 

218 

219 

220 

221 

222 

223 

224 

225 

226 

227 

228 

229 

230 

231 

232 

233 

234 

235 

236 

237 

238 

239 

240 

241 

242 

243 


signal  muxUXY_to_muxUXY :  signed  (1  downto  0); 

signal  output_data_to_data_input  ,  data_out_to_post_divider_df f : 

signed  (data_width  -  1  downto  0) ; 
signal  output _dat a_to_mux_ input  1  ,  ... 

output_data_matrixU_to_predivider_df f  :  signed  (data_width  -  1  . 

downto  0)  ; 

signal  read_address_matr ixU_control_to_read_addres_matr ixU  :  signed 
( address_width  -  1  downto  0); 

signal  read_address_matr ixY_control_to_read_addres_matr ixY :  signed 
( address_width  -  1  downto  0); 

signal  write_address_matrixX_control_to_write_address_matrixX :  .  .  . 

signed  ( address_width  -  1  downto  0); 
signal  mux_output_to_boothinput A ,  output_data_matr ixY_to_muxinput 1 
:  signed  (data_width  -  1  downto  0) ; 
signal  booth2bit_output_to_input_value_memoryX :  signed  (data_width 
-  1  downto  0)  ; 

signal  mux_output_to_booth_inputB ,  output_data_to_muxinput3 :  ... 

signed  (data_width  -  1  downto  0) ; 
signal  checkneg_output_value_to_input_data_df f  ,  ... 

df f _output_to_adder_inputA :  signed  (data_width  -  1  downto  0); 
signal  enable_checkneg_to_checkneg :  std_logic; 


begin 

yresultbooth.iodule 1  : yresultbooth_module  port  map  (elk  =>  elk, 

reset  =>  reset  , 

enable.matrixAt A  =>  enable.matr ixAtA , 
enable_At A_module  =>  enable_At A_module  , 
enable.matr ixBvector  =>  enable.matrixBvector , 
read.addr ess_matr ixA  =>  read_address_matr ixA , 

read_address_matrixBvectors  =>  read_address_matr ixBvectors  , 
read.addr ess_matr ixU  => 

read_address_matrixU_control_to_read_addres_matrixU, 
read.addr ess_matr ixY  => 

read_address_matrixY_control_to_read_addres_matrixY  , 
input.matr ixA_values  =>  input_matrixA_values , 

input_matrixBvectors_values  =>  input_matrixBvectors_values  , 
output_data_matrixY  =>  output _dat a_mat rixY_t o.muxinput 1  , 
output_data_matrixU  =>  output_data_matrixU_to_predivider_df f  , 
enable_f inished  =>  enable _f ini shed_t o _xcont ro 1 ) ; 

xcontrolboothl  :  xcontr  olbooth.  port  map(clk  =>  elk, 
reset  =>  reset  , 

enable.control  =>  enable.f inished_to_xcontrol  , 
enable_predivider_register  => 

enable_predividercontrol_to_predivider_dff  , 
enable_postdivider_register  => 

enable_postdivider_registercontrol_to_postdivider_dff  , 
enable.checkneg  =>  enable_checkneg_to_checkneg  , 
enable_divider  =>  enable_divider_to_load , 
mux_load_di vider  =>  mux_load_divider_to_mux_control  , 
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265 

266 

267 
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274 
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276 
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290 
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muxU  =>  muxU_to_muxU , 
muxUXY  =>  muxUXY_to_muxUXY , 

writeEnable_matrixX  =>  wr iteEnable_matr ix_to_wr ite_memoryX , 
enable_f inished  =>  enable_f inished , 
execute  =>  execute_to_execute  , 
read_address_matrixU  => 

read_address_matrixU_control_to_read_addres_matrixU, 
wr ite_address_matr ixX  => 

write_address_matrixX_control_to_write_address_matrixX  , 
read.addr ess_matr ixY  => 

read_address_matrixY_control_to_read_addres_matrixY)  ; 

reciprocal.topl :  reciprocal_top  port  map  (elk  =>  elk, 
reset  =>  reset  , 

load  =>  enable_divider _to_load  , 

mux.control  =>  mux_load_divider_to_mux_control  , 
data_in  =>  output_data_to_data_input  , 
data.out  =>  data_out_to_post_divider_df f  , 
overflow  =>  open); 

mux2toldata_widthY :  mux2toldata_widtli  port  map 
sel  =>  muxU_to_muxU , 

inputl  =>  out put _data_t o_mux_ input  1  , 

input2  =>  output_data_matrixU_to_predivider_df f  , 

mux.output  =>  mux_ out put _t o_boot hinput A )  ; 

mux3toldata_widthUXY :  mux3toldata_width  port  map 
(sel  =>  muxUXY_t o.muxUXY , 

inputl  =>  output _dat a_mat rixY_t o _muxinput 1  , 
input2  =>  booth2bit_output_to_input_value_memoryX  , 
input3  =>  output_data_to_muxinput3  , 
mux.output  =>  mux_ output _t o_booth_ input B ) ; 

mem.matrixX :  mem_matrix  port  map  (elk  =>  elk, 
reset  =>  reset  , 

writeEnable  =>  wr iteEnable_matrix_to_wr ite.memoryX  , 
matr ix.values  =>  booth2bit_output_to_input_value_memoryX  , 
write_address  => 

write_address_matrixX_control_to_write_address_matrixX  , 
read.address  =>  read_address_matr ixX  , 
output.matr ix_values  =>  output_data_matrixX) ; 

booth2bitl:  booth2bit  port  map(clk  =>  elk, 
reset  =>  reset  , 

execute  =>  execute_to_execute  , 

input.matr ixA_values  =>  mux_output_to_bootliinput A  , 
input.matr ixB.values  =>  mux_output_to_booth_inputB , 
mult_ready  =>  open  ,  -  - 

booth2bit_output  =>  booth2bit_output_to_input_value_memoryX  , 
mult.done  =>  open) ; -- 

adderl:  adder  port  map ( 
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296 

input  _mult 

297 

input_reg_ 

298 

output  _dat 

299 

300 

checknegl  : 

301 

enable  => 

302 

matr ix_val 

303 

output  _ val 

304 

305 

df  f predi vi 

306 

enable  => 

307 

input  _dat  a 

308 

output  _dat 

309 

310 

df fpostdiv 

311 

enable  => 

312 

enable_pos 

313 

input  _dat  a 

314 

output_dat 

315 

316 

df fpostche 

317 

enable  => 

318 

enable_pos 

319 

input  _dat  a 

320 

output  _dat 

321 

322 

323 

324 

end  struct 

_data  =>  df f _output_to_adder_input A  , 
data  =>  output_data_matrixY_to_muxinput 1  , 
a  =>  output _dat a_t o_mux input  3 )  ; 

checkneg  port  map(clk  =>  elk, 
enable_checkneg_to_checkneg  , 

ue  =>  booth2bit_output_to_input_value_memoryX 
ue  =>  ch.eckneg_output_value_to_input_data_df  f 

der :  dff  port  map(clk  =>  elk, 

enable _predividercontrol_to_predivider_dff  , 

=>  output_data_matrixU_to_predivider_df f  , 
a  =>  output_data_to_data_input ) ; 

ider :  dff  port  map(clk  =>  elk, 

tdivider_registercontrol_to_postdivider_dff  , 
=>  data_out_to_post_divider_df f  , 
a  =>  output _dat a_t o_mux_ input  1 )  ; 

ckneg  :  dff  port  map(clk  =>  elk, 

tdivider_registercontrol_to_postdivider_dff  , 
=>  checkneg_output_value_to_input_data_df f  , 
a  =>  df f _output_to_adder_inputA) ; 

ure_xresultbooth_module  ; 


) 
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